Results 91 to 100 of about 132,225 (206)
Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention [PDF]
Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and significantly lowers memory bandwidth demands, particularly in the autoregressive decode phase.
arxiv
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression [PDF]
Multi-head latent attention (MLA) is designed to optimize KV cache memory through low-rank key-value joint compression. Rather than caching keys and values separately, MLA stores their compressed latent representations, reducing memory overhead while maintaining the performance.
arxiv
Student Inquiry and the Rascal Triangle [PDF]
Those of us who teach Mathematics for Liberal Arts (MLA) courses often underestimate the mathematical abilities of the students enrolled in our courses. Despite the fact that many of these students suffer from math anxiety and will admit to hating mathematics, when we give them space to explore mathematics and bring their existing knowledge to the ...
arxiv
Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment [PDF]
Real-world data distributions are often highly skewed. This has spurred a growing body of research on long-tailed recognition, aimed at addressing the imbalance in training classification models. Among the methods studied, multiplicative logit adjustment (MLA) stands out as a simple and effective method.
arxiv
The MLA's Poet Presidents [PDF]
Sandra M. Gilbert+2 more
openaire +2 more sources