Long-context inference - Open Access .click

Results 111 to 120 of about 1,257,085 (204)

Benchmarking EGF signaling pathway inference using phosphoproteomics and kinase-substrate interactions

Nature Communications
Signaling pathways are useful models for interpreting molecular data, but their coverage has long been constrained by classic biochemistry methods. The growing corpus of kinase-substrate interactions, coupled to phosphoproteomics improvements, pave the ...
Martin Garrido-Rodriguez +7 more
doaj +1 more source

KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference

Language models (LMs) underpin emerging mobile and embedded AI applications like meeting and video summarization and document analysis, which often require processing multiple long-context inputs. Running an LM locally on-device improves privacy, enables offline use, and reduces cost, but long-context inference quickly hits a \emph{memory capacity wall}
Zhang, Huawei, Xia, Chunwei, Wang, Zheng
openaire +2 more sources

D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models

Generative inference in Large Language Models (LLMs) is impeded by the growing memory demands of Key-Value (KV) cache, especially for longer sequences. Traditional KV cache eviction strategies, which discard less critical KV pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations.
Wan, Zhongwei +10 more
openaire +2 more sources

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity

14 pages, 7 figures, 7 ...
Ma, Da +10 more
openaire +2 more sources

L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference

Large Language Models (LLMs) increasingly require processing long text sequences, but GPU memory limitations force difficult trade-offs between memory capacity and bandwidth. While HBM-based acceleration offers high bandwidth, its capacity remains constrained.
Liu, Qingyuan +8 more
openaire +2 more sources

Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference

Recent advances in large language models (LLMs) have showcased exceptional performance in long-context tasks, while facing significant inference efficiency challenges with limited GPU memory. Existing solutions first proposed the sliding-window approach to accumulate a set of historical \textbf{key-value} (KV) pairs for reuse, then further improvements
Xiao, Qingfa +8 more
openaire +2 more sources

Contextual inference through flexible integration of environmental features and behavioural outcomes. [PDF]

PLoS Comput Biol
Passlack J, MacAskill AF.
europepmc +1 more source

Cognitive architecture and behavioral model based on social evidence and resource constraints. [PDF]

Brain Inform
Kolonin A.
europepmc +1 more source

Context-aware temporal synthesis for scene, entity, and event inference from silent image. [PDF]

Front Neurosci
Rokaya M, Hemdan DI, Alzain MA, Atlam ES. +3 more
europepmc +1 more source

Behavioromics: A New Paradigm of Big Data-Powered Insights for Proactive Health. [PDF]

Research (Wash D C)
Shi J, Li J, Huang H, Teng T, Wu H.
europepmc +1 more source

computer science
fos: computer and information sciences
computation and language cs.cl

computer science - computation and language
machine learning cs.lg
computer science - machine learning

computation and language
artificial intelligence cs.ai
computer science - artificial intelligence