Results 1 to 10 of about 1,257,065 (184)

A Cloud-Aware Scalable Architecture for Distributed Edge-Enabled BCI Biosensor System [PDF]

open access: yesBiosensors
BCI biosensors enable continuous monitoring of neural activity, but existing systems face challenges in scalability, latency, and reliable integration with cloud infrastructure.
Sayantan Ghosh   +7 more
doaj   +2 more sources

Characterizing Prompt Compression Methods for Long Context Inference

open access: yesarXiv.org
Long context inference presents challenges at the system level with increased compute and memory requirements, as well as from an accuracy perspective in being able to reason over long contexts.
Siddharth Jha   +4 more
semanticscholar   +3 more sources

Long-Context Inference with Retrieval-Augmented Speculative Decoding

open access: yesInternational Conference on Machine Learning
The emergence of long-context large language models (LLMs) offers a promising alternative to traditional retrieval-augmented generation (RAG) for processing extensive documents.
Guanzheng Chen   +4 more
semanticscholar   +3 more sources

MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference

open access: yesProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Long-context Multimodal Large Language Models (MLLMs) that incorporate long text-image and text-video modalities, demand substantial resources as their multimodal Key-Value (KV) caches grow with increasing input lengths, challenging inference efficiency.
Zhongwei Wan   +5 more
semanticscholar   +3 more sources

Adamas: Hadamard Sparse Attention for Efficient Long-Context Inference

open access: yesarXiv.org
Large language models (LLMs) now support context windows of hundreds of thousands to millions of tokens, enabling applications such as long-document summarization, large-scale code synthesis, multi-document question answering and persistent multi-turn ...
Siyuan Yan   +6 more
semanticscholar   +3 more sources

SparseAccelerate: Efficient Long-Context Inference for Mid-Range GPUs

open access: yesarXiv.org
As Large Language Models (LLMs) scale to longer context windows, the computational cost of attention mechanisms, which traditionally grows quadratically with input length, presents a critical challenge for real-time and memory-constrained deployments ...
J. Vo
semanticscholar   +3 more sources

Long-context inference optimization for large language models: a survey

open access: yes大数据
With the rapid development of large language model (LLM) technology, the demand for processing long-text inputs has been increasing. However, long-text inference faces challenges such as high memory consumption and latency.
TAO Wei   +3 more
doaj   +5 more sources

BaKlaVa - Budgeted Allocation of KV cache for Long-context Inference

open access: yesarXiv.org
In Large Language Model (LLM) inference, Key-Value (KV) caches (KV-caches) are essential for reducing time complexity. However, they result in a linear increase in GPU memory as the context length grows.
A. B. Gulhan   +4 more
semanticscholar   +3 more sources

AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference

open access: yesProceedings of the AAAI Conference on Artificial Intelligence
Long-context large language models (LLMs) inference is increasingly critical, motivating a number of studies devoted to alleviating the substantial storage and computational costs in such scenarios. Layer-wise skipping methods are promising optimizations
Zhuomin He   +6 more
semanticscholar   +3 more sources

Squeezed Attention: Accelerating Long Context Length LLM Inference

open access: yesProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Emerging Large Language Model (LLM) applications require long input context in order to perform complex tasks like document analysis and code generation.
Coleman Hooper   +7 more
semanticscholar   +3 more sources

Home - About - Disclaimer - Privacy