Results 91 to 100 of about 1,257,085 (204)

XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference

open access: yes
Recently the generative Large Language Model (LLM) has achieved remarkable success in numerous applications. Notably its inference generates output tokens one-by-one, leading to many redundant computations. The widely-used KV-Cache framework makes a compromise between time and space complexities. However, caching data generates the increasingly growing
Li, Weizhuo   +3 more
openaire   +2 more sources

LLM-Augmented Prototype Representation for Few-Shot Named-Entity Recognition

open access: yesIEEE Access
Named Entity Recognition (NER) models face challenges in adapting to data distribution shifts, especially with unseen entity types and limited data. Few-shot learning is used to address long-tailed distributions and unseen classes, but struggles with few
Weerayut Buaphet   +4 more
doaj   +1 more source

How reliable are the statistics for the Stability and Growth Pact? [PDF]

open access: yes
The aim of this paper is to assess the reliability of the government deficit and debt figures reported to the European Commission by Member States. Reliability is one of the several dimensions of quality in statistics; it refers to the magnitudes of data
Jo�o Nogueira Martins   +1 more
core  

DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration

open access: yesFindings of the Association for Computational Linguistics: ACL 2025
Long-context understanding is crucial for many NLP applications, yet transformers struggle with efficiency due to the quadratic complexity of self-attention. Sparse attention methods alleviate this cost but often impose static, predefined masks, failing to capture heterogeneous attention patterns. This results in suboptimal token interactions, limiting
Zhang, Hanzhi   +4 more
openaire   +2 more sources

Microbial network inference for longitudinal microbiome studies with LUPINE

open access: yesMicrobiome
Background The microbiome is a complex ecosystem of interdependent taxa that has traditionally been studied through cross-sectional studies. However, longitudinal microbiome studies are becoming increasingly popular.
Saritha Kodikara, Kim-Anh Lê Cao
doaj   +1 more source

Asymmetric-Convolution-Guided Multipath Fusion for Real-Time Semantic Segmentation Networks

open access: yesMathematics
Aiming to provide solutions for problems proposed by the inaccurate segmentation of long objects and information loss of small objects in real-time semantic segmentation algorithms, this paper proposes a lightweight multi-branch real-time semantic ...
Jie Liu, Bing Zhao, Ming Tian
doaj   +1 more source

Paged Attention Meets FlexAttention: Unlocking Long-Context Efficiency in Deployed Inference

open access: yes
Large Language Models (LLMs) encounter severe memory inefficiencies during long-context inference due to conventional handling of key-value (KV) caches. In this work, we introduce a novel integration of PagedAttention with PyTorch's FlexAttention, addressing internal fragmentation and inefficiencies associated with monolithic KV cache allocations ...
Joshi, Thomas   +4 more
openaire   +2 more sources

Identification and functional characterization of lncRNAs involved in human monocyte-to-macrophage differentiation

open access: yesRNA Biology
Although long noncoding RNAs (lncRNAs) constitute the majority of the human transcriptome, the functional roles of most remain elusive. While protein-coding genes in macrophage biology have been extensively studied, the contribution of lncRNAs in this ...
Christy Montano   +4 more
doaj   +1 more source

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference

open access: yes
LLMs now form the backbone of AI agents for a diverse array of applications, including tool use, command-line agents, and web or computer use agents. These agentic LLM inference tasks are fundamentally different from chatbot-focused inference -- they often have much larger context lengths to capture complex, prolonged inputs, such as entire webpage ...
Wu, Haoran   +15 more
openaire   +2 more sources

Learned adaptive properties for mitigation of weight perturbations in embedded spiking networks

open access: yesFrontiers in Neuroscience
Recent years have seen an increased importance of neural network inference in edge-based scenarios, which impose size and power constraints requiring novel computing devices.
Sarah Luca   +6 more
doaj   +1 more source

Home - About - Disclaimer - Privacy