Results 111 to 120 of about 1,257,085 (204)

Benchmarking EGF signaling pathway inference using phosphoproteomics and kinase-substrate interactions

open access: yesNature Communications
Signaling pathways are useful models for interpreting molecular data, but their coverage has long been constrained by classic biochemistry methods. The growing corpus of kinase-substrate interactions, coupled to phosphoproteomics improvements, pave the ...
Martin Garrido-Rodriguez   +7 more
doaj   +1 more source

KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference

open access: yes
Language models (LMs) underpin emerging mobile and embedded AI applications like meeting and video summarization and document analysis, which often require processing multiple long-context inputs. Running an LM locally on-device improves privacy, enables offline use, and reduces cost, but long-context inference quickly hits a \emph{memory capacity wall}
Zhang, Huawei, Xia, Chunwei, Wang, Zheng
openaire   +2 more sources

D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

open access: yes
Generative inference in Large Language Models (LLMs) is impeded by the growing memory demands of Key-Value (KV) cache, especially for longer sequences. Traditional KV cache eviction strategies, which discard less critical KV pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations.
Wan, Zhongwei   +10 more
openaire   +2 more sources

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity

open access: yes
14 pages, 7 figures, 7 ...
Ma, Da   +10 more
openaire   +2 more sources

L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference

open access: yes
Large Language Models (LLMs) increasingly require processing long text sequences, but GPU memory limitations force difficult trade-offs between memory capacity and bandwidth. While HBM-based acceleration offers high bandwidth, its capacity remains constrained.
Liu, Qingyuan   +8 more
openaire   +2 more sources

Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference

open access: yes
Recent advances in large language models (LLMs) have showcased exceptional performance in long-context tasks, while facing significant inference efficiency challenges with limited GPU memory. Existing solutions first proposed the sliding-window approach to accumulate a set of historical \textbf{key-value} (KV) pairs for reuse, then further improvements
Xiao, Qingfa   +8 more
openaire   +2 more sources

Home - About - Disclaimer - Privacy