Results 11 to 20 of about 6,939 (213)
Mamba-Reg: Vision Mamba Also Needs Registers
Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba -- they exist prevalently even with the tiny-sized model and activate extensively across ...
Feng Wang 0047 +8 more
openaire +5 more sources
Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval
In the information retrieval (IR) area, dense retrieval (DR) models use deep learning techniques to encode queries and passages into embedding space to compute their semantic relations. It is important for DR models to balance both efficiency and effectiveness.
Hanqi Zhang +4 more
openaire +3 more sources
Irregular and asynchronous event sequences are prevalent in many domains, such as social media, finance, and healthcare. Traditional temporal point processes (TPPs), like Hawkes processes, often struggle to model mutual inhibition and nonlinearity effectively.
Anningzhe Gao, Shan Dai, Yan Hu
openaire +3 more sources
Bi-Mamba+: Bidirectional Mamba for Time Series Forecasting
New Mamba-based architecture.
Liang, Aobo +4 more
openaire +3 more sources
Online in-context reinforcement learning enhances offline-trained policies through online fine-tuning. We introduce Online Decision Mamba (ODM), an architecture that replaces the attention mechanism in Online Decision Transformers (ODT) with the Mamba ...
Trenton W. Ruf, Banafsheh Rekabdar
openaire +2 more sources
idoiagamiz/SCALE-MAMBA: v1.0.0
Repository for the SCALE-MAMBA MPC ...
NigelSmart +4 more
core +1 more source
Mamba Modulation: On the Length Generalization of Mamba
The quadratic complexity of the attention mechanism in Transformer models has motivated the development of alternative architectures with sub-quadratic scaling, such as state-space models. Among these, Mamba has emerged as a leading architecture, achieving state-of-the-art results across a range of language modeling tasks.
Peng Lu 0006 +6 more
openaire +2 more sources
As one of the most representative DL techniques, Transformer architecture has empowered numerous advanced models, especially the large language models (LLMs) that comprise billions of parameters, becoming a cornerstone in deep learning. Despite the impressive achievements, Transformers still face inherent limitations, particularly the time-consuming ...
Haohao Qu +7 more
openaire +2 more sources
Sequence models like Transformers and RNNs often overallocate attention to irrelevant context, leading to noisy intermediate representations. This degrades LLM capabilities by promoting hallucinations, weakening long-range and retrieval abilities, and reducing robustness.
Nadav Schneider +2 more
openaire +3 more sources
This study introduces a foundation model‐based biomarker for risk stratification of pathological response in non‐small cell lung cancer. A Vision Mamba super‐resolution model standardizes heterogeneous CT images. A multi‐task Swin Transformer then fine‐tunes a pre‐trained lung foundation model to jointly optimize tumor segmentation and response ...
Yanglan Xu +10 more
wiley +1 more source

