Long-context inference - Open Access .click

Results 21 to 30 of about 1,257,085 (204)

Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs

arXiv.org
There is growing demand for performing inference with hundreds of thousands of input tokens on trained transformer models. Inference at this extreme scale demands significant computational resources, hindering the application of transformers at long ...
Ryan Synk +8 more
semanticscholar +3 more sources

RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

arXiv.org
The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints.
Yaoqi Chen +17 more
semanticscholar +3 more sources

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference

arXiv.org
Large Language Models (LLMs) require significant GPU memory when processing long texts, with the key value (KV) cache consuming up to 70\% of total memory during inference.
Xiang Liu +6 more
semanticscholar +3 more sources

RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression

International Conference on Machine Learning
Transformer-based Large Language Models rely critically on the KV cache to efficiently handle extended contexts during the decode phase. Yet, the size of the KV cache grows proportionally with the input length, burdening both memory bandwidth and ...
Payman Behnam +5 more
semanticscholar +3 more sources

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

International Conference on Learning Representations
Deploying long-context large language models (LLMs) is essential but poses significant computational and memory challenges. Caching all Key and Value (KV) states across all attention heads consumes substantial memory.
Guangxuan Xiao +7 more
semanticscholar +3 more sources

AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference

Companion of the 2025 International Conference on Management of Data
AlayaDB is a cutting-edge vector database system natively architected for efficient and effective long-context inference for Large Language Models (LLMs) at AlayaDB AI.
Yangshen Deng +15 more
semanticscholar +3 more sources

AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Recently, large language models (LLMs) have achieved huge success in the natural language processing (NLP) field, driving a growing demand to extend their deployment from the cloud to edge devices.
Yanbiao Liang +3 more
semanticscholar +3 more sources

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

International Conference on Machine Learning
With the widespread deployment of long-context large language models (LLMs), there has been a growing demand for efficient support of high-throughput inference.
Hanshi Sun +8 more
semanticscholar +3 more sources

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

arXiv.org
The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens.
Qichen Fu +5 more
semanticscholar +3 more sources

CEG: A joint model for causal commonsense events enhanced story ending generation

PLoS ONE, 2023
With the success of pre-trained language models, the performance of story ending generation has been dramatically improved while remaining challenging due to the lack of commonsense reasoning ability. Most previous works mainly focus on using commonsense
Yushi Zhang +5 more
doaj +2 more sources

computer science
fos: computer and information sciences
computation and language cs.cl

computer science - computation and language
machine learning cs.lg
computer science - machine learning

computation and language
artificial intelligence cs.ai
computer science - artificial intelligence