Long-context inference - Open Access .click

Results 161 to 170 of about 1,257,085 (204)

Neural correlates in the time course of inferences: costs and benefits for less-skilled readers at the university level. [PDF]

Front Psychol
Urrutia M +8 more
europepmc +1 more source

Some of the next articles are maybe not open access.

Related searches:

computer science
fos: computer and information sciences
computation and language cs.cl

computer science - computation and language
machine learning cs.lg
computer science - machine learning

computation and language
artificial intelligence cs.ai
computer science - artificial intelligence

METAL: A Memory-Efficient Transformer Architecture for Long-Context Inference on FPGA

2025 IEEE 36th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
Transformer-based models have shown remarkable proficiency in extensive tasks for natural language processing, which are facing the ever-increasing need of processing long-context inputs.
Zicheng He +5 more
semanticscholar +2 more sources

InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

2025 IEEE International Symposium on High Performance Computer Architecture (HPCA)
The widespread of Large Language Models (LLMs) marks a significant milestone in generative AI. Nevertheless, the increasing context length and batch size in offline LLM inference escalate the memory requirement of the key-value (KV) cache, which imposes ...
Xiurui Pan +8 more
semanticscholar +2 more sources

BlockPIM: Optimizing Memory Management for PIM-enabled Long-Context LLM Inference

2025 62nd ACM/IEEE Design Automation Conference (DAC)
Processing-In-Memory (PIM) architectures alleviate the memory bottleneck in the decode phase of large language model (LLM) inference by performing operations like GEMV and Softmax in memory.
Zhichun Li, Jun Zhou, Xueqi Li, Ninghui Sun +3 more
semanticscholar +2 more sources

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

Conference on Empirical Methods in Natural Language Processing
Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency.
Zhongwei Wan +7 more
semanticscholar +1 more source

LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference

arXiv.org
Efficient long-context inference is critical as large language models (LLMs) adopt context windows of ranging from 128K to 1M tokens. However, the growing key-value (KV) cache and the high computational complexity of attention create significant ...
Guangtao Wang +7 more
semanticscholar +1 more source

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection

Conference on Empirical Methods in Natural Language Processing
Rapid advances in Large Language Models (LLMs) have spurred demand for processing extended context sequences in contemporary applications. However, this progress faces two challenges: performance degradation due to sequence lengths out-of-distribution ...
Wei Wu +7 more
semanticscholar +1 more source

computer science
fos: computer and information sciences
computation and language cs.cl

computer science - computation and language
machine learning cs.lg
computer science - machine learning

computation and language
artificial intelligence cs.ai
computer science - artificial intelligence