Results 161 to 170 of about 1,257,085 (204)
Neural correlates in the time course of inferences: costs and benefits for less-skilled readers at the university level. [PDF]
Urrutia M +8 more
europepmc +1 more source
Some of the next articles are maybe not open access.
Related searches:
Related searches:
METAL: A Memory-Efficient Transformer Architecture for Long-Context Inference on FPGA
2025 IEEE 36th International Conference on Application-specific Systems, Architectures and Processors (ASAP)Transformer-based models have shown remarkable proficiency in extensive tasks for natural language processing, which are facing the ever-increasing need of processing long-context inputs.
Zicheng He +5 more
semanticscholar +2 more sources
InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference
2025 IEEE International Symposium on High Performance Computer Architecture (HPCA)The widespread of Large Language Models (LLMs) marks a significant milestone in generative AI. Nevertheless, the increasing context length and batch size in offline LLM inference escalate the memory requirement of the key-value (KV) cache, which imposes ...
Xiurui Pan +8 more
semanticscholar +2 more sources
BlockPIM: Optimizing Memory Management for PIM-enabled Long-Context LLM Inference
2025 62nd ACM/IEEE Design Automation Conference (DAC)Processing-In-Memory (PIM) architectures alleviate the memory bottleneck in the decode phase of large language model (LLM) inference by performing operations like GEMV and Softmax in memory.
Zhichun Li +3 more
semanticscholar +2 more sources
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Conference on Empirical Methods in Natural Language ProcessingLong-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency.
Zhongwei Wan +7 more
semanticscholar +1 more source
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
arXiv.orgEfficient long-context inference is critical as large language models (LLMs) adopt context windows of ranging from 128K to 1M tokens. However, the growing key-value (KV) cache and the high computational complexity of attention create significant ...
Guangtao Wang +7 more
semanticscholar +1 more source
Conference on Empirical Methods in Natural Language Processing
Rapid advances in Large Language Models (LLMs) have spurred demand for processing extended context sequences in contemporary applications. However, this progress faces two challenges: performance degradation due to sequence lengths out-of-distribution ...
Wei Wu +7 more
semanticscholar +1 more source
Rapid advances in Large Language Models (LLMs) have spurred demand for processing extended context sequences in contemporary applications. However, this progress faces two challenges: performance degradation due to sequence lengths out-of-distribution ...
Wei Wu +7 more
semanticscholar +1 more source

