Results 91 to 100 of about 132,225 (206)

Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention [PDF]

open access: yesarXiv
Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and significantly lowers memory bandwidth demands, particularly in the autoregressive decode phase.
arxiv  

X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression [PDF]

open access: yesarXiv
Multi-head latent attention (MLA) is designed to optimize KV cache memory through low-rank key-value joint compression. Rather than caching keys and values separately, MLA stores their compressed latent representations, reducing memory overhead while maintaining the performance.
arxiv  

Student Inquiry and the Rascal Triangle [PDF]

open access: yesarXiv, 2019
Those of us who teach Mathematics for Liberal Arts (MLA) courses often underestimate the mathematical abilities of the students enrolled in our courses. Despite the fact that many of these students suffer from math anxiety and will admit to hating mathematics, when we give them space to explore mathematics and bring their existing knowledge to the ...
arxiv  

Multiplicative Logit Adjustment Approximates Neural-Collapse-Aware Decision Boundary Adjustment [PDF]

open access: yesarXiv
Real-world data distributions are often highly skewed. This has spurred a growing body of research on long-tailed recognition, aimed at addressing the imbalance in training classification models. Among the methods studied, multiplicative logit adjustment (MLA) stands out as a simple and effective method.
arxiv  

The MLA's Poet Presidents [PDF]

open access: yesPMLA/Publications of the Modern Language Association of America, 1998
Sandra M. Gilbert   +2 more
openaire   +2 more sources

Home - About - Disclaimer - Privacy