Long-context inference - Open Access .click

Results 181 to 190 of about 1,257,085 (204)

Some of the next articles are maybe not open access.

Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads

Trans. Mach. Learn. Res.
Scaling the input context length of a large language model (LLM) incurs a significant increase in computation cost and memory footprint to maintain the attention key-value (KV) cache.
Yuxiang Huang +4 more
semanticscholar +1 more source

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

International Conference on Learning Representations
Large language models (LLMs) encounter computational challenges during long-sequence inference, especially in the attention pre-filling phase, where the complexity grows quadratically with the prompt length.
Xunhao Lai +4 more
semanticscholar +1 more source

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention

Conference on Machine Learning and Systems
Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional
Qianchao Zhu +11 more
semanticscholar +1 more source

Bayesian inference for predicting the long-term deflection of prestressed concrete bridges by on-site measurements

Construction and Building Materials, 2022
Bing Han
exaly

A novel spatio-temporal generative inference network for predicting the long-term highway traffic speed

Transportation Research Part C: Emerging Technologies, 2023
Guojian Zou
exaly

Generative replay underlies compositional inference in the hippocampal-prefrontal circuit

Cell, 2023
Philipp Schwartenbeck
exaly

Long short-term memory (LSTM) neural network and adaptive neuro-fuzzy inference system (ANFIS) approach in modeling renewable electricity generation forecasting

International Journal of Green Energy, 2021
Mehmet Bilgili, Alper Yildirim, Arif Ozbek +2 more
exaly

Bayesian inference of heterogeneous epidemic models: Application to COVID-19 spread accounting for long-term care facilities

Computer Methods in Applied Mechanics and Engineering, 2021
Peng Chen, Keyi Wu, Omar Ghattas
exaly

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference

Nature Electronics, 2023
Manuel Le Gallo +2 more
exaly