Signaling pathways are useful models for interpreting molecular data, but their coverage has long been constrained by classic biochemistry methods. The growing corpus of kinase-substrate interactions, coupled to phosphoproteomics improvements, pave the ...
Martin Garrido-Rodriguez +7 more
doaj +1 more source
KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference
Language models (LMs) underpin emerging mobile and embedded AI applications like meeting and video summarization and document analysis, which often require processing multiple long-context inputs. Running an LM locally on-device improves privacy, enables offline use, and reduces cost, but long-context inference quickly hits a \emph{memory capacity wall}
Zhang, Huawei, Xia, Chunwei, Wang, Zheng
openaire +2 more sources
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Generative inference in Large Language Models (LLMs) is impeded by the growing memory demands of Key-Value (KV) cache, especially for longer sequences. Traditional KV cache eviction strategies, which discard less critical KV pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations.
Wan, Zhongwei +10 more
openaire +2 more sources
Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
14 pages, 7 figures, 7 ...
Ma, Da +10 more
openaire +2 more sources
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
Large Language Models (LLMs) increasingly require processing long text sequences, but GPU memory limitations force difficult trade-offs between memory capacity and bandwidth. While HBM-based acceleration offers high bandwidth, its capacity remains constrained.
Liu, Qingyuan +8 more
openaire +2 more sources
Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference
Recent advances in large language models (LLMs) have showcased exceptional performance in long-context tasks, while facing significant inference efficiency challenges with limited GPU memory. Existing solutions first proposed the sliding-window approach to accumulate a set of historical \textbf{key-value} (KV) pairs for reuse, then further improvements
Xiao, Qingfa +8 more
openaire +2 more sources
Contextual inference through flexible integration of environmental features and behavioural outcomes. [PDF]
Passlack J, MacAskill AF.
europepmc +1 more source
Cognitive architecture and behavioral model based on social evidence and resource constraints. [PDF]
Kolonin A.
europepmc +1 more source
Context-aware temporal synthesis for scene, entity, and event inference from silent image. [PDF]
Rokaya M +3 more
europepmc +1 more source
Behavioromics: A New Paradigm of Big Data-Powered Insights for Proactive Health. [PDF]
Shi J, Li J, Huang H, Teng T, Wu H.
europepmc +1 more source

