Results 181 to 190 of about 1,257,085 (204)
Some of the next articles are maybe not open access.
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads
Trans. Mach. Learn. Res.Scaling the input context length of a large language model (LLM) incurs a significant increase in computation cost and memory footprint to maintain the attention key-value (KV) cache.
Yuxiang Huang +4 more
semanticscholar +1 more source
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
International Conference on Learning RepresentationsLarge language models (LLMs) encounter computational challenges during long-sequence inference, especially in the attention pre-filling phase, where the complexity grows quadratically with the prompt length.
Xunhao Lai +4 more
semanticscholar +1 more source
Conference on Machine Learning and Systems
Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional
Qianchao Zhu +11 more
semanticscholar +1 more source
Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional
Qianchao Zhu +11 more
semanticscholar +1 more source
Generative replay underlies compositional inference in the hippocampal-prefrontal circuit
Cell, 2023Philipp Schwartenbeck
exaly
Advantages and limitations of current network inference methods
Nature Reviews Microbiology, 2010Riet De Smet, Kathleen Marchal
exaly

