Results 1 to 10 of about 249,885 (208)

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models [PDF]

open access: yesInternational Conference on Learning Representations, 2023
Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements.
Wenqi Shao   +9 more
semanticscholar   +1 more source

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [PDF]

open access: yesInternational Conference on Machine Learning, 2022
Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accuracy and hardware efficiency at the same time.
Guangxuan Xiao   +4 more
semanticscholar   +1 more source

QuIP: 2-Bit Quantization of Large Language Models With Guarantees [PDF]

open access: yesNeural Information Processing Systems, 2023
This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices,
Jerry Chee   +3 more
semanticscholar   +1 more source

SqueezeLLM: Dense-and-Sparse Quantization [PDF]

open access: yesInternational Conference on Machine Learning, 2023
Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements.
Sehoon Kim   +7 more
semanticscholar   +1 more source

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2023
Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits. We find that these methods break down at lower bit precision, and investigate quantization aware training ...
Zechun Liu   +8 more
semanticscholar   +1 more source

Finite Scalar Quantization: VQ-VAE Made Simple [PDF]

open access: yesInternational Conference on Learning Representations, 2023
We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a few dimensions (typically less than 10). Each dimension
Fabian Mentzer   +3 more
semanticscholar   +1 more source

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models [PDF]

open access: yesInternational Conference on Learning Representations, 2023
Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained ...
Yixiao Li   +6 more
semanticscholar   +1 more source

Double quantization

open access: yesPhysical Review D, 2022
v2 matches the version accepted for publication on Phys. Rev. D. It includes additional clarifications and references. v3 includes some missing terms in a couple of equations.
Giulia Gubitosi   +3 more
openaire   +5 more sources

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers [PDF]

open access: yesNeural Information Processing Systems, 2022
How to efficiently serve ever-larger trained natural language models in practice has become exceptionally challenging even for powerful cloud servers due to their prohibitive memory/computation requirements.
Z. Yao   +5 more
semanticscholar   +1 more source

Autoregressive Image Generation using Residual Quantization [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range ...
Doyup Lee   +4 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy