Results 11 to 20 of about 249,885 (208)
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [PDF]
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes.
Benoit Jacob+7 more
semanticscholar +1 more source
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning [PDF]
We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of ...
Elias Frantar, Dan Alistarh
semanticscholar +1 more source
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth). In this paper, we propose Activation-
Ji Lin+5 more
semanticscholar +1 more source
A Survey of Quantization Methods for Efficient Neural Network Inference [PDF]
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose.
A. Gholami+5 more
semanticscholar +1 more source
PTQD: Accurate Post-Training Quantization for Diffusion Models [PDF]
Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications.
Yefei He+5 more
semanticscholar +1 more source
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization [PDF]
Large language models (LLMs) face the challenges in fine-tuning and deployment due to their high memory demands and computational costs. While parameter-efficient fine-tuning (PEFT) methods aim to reduce the memory usage of the optimizer state during ...
Jeonghoon Kim+6 more
semanticscholar +1 more source
In this paper we continue our study of Groenewold-Van Hove obstructions to quantization. We show that there exists such an obstruction to quantizing the cylinder $T^*S^1.$ More precisely, we prove that there is no quantization of the Poisson algebra of $T^*S^1$ which is irreducible on a naturally defined $e(2) \times R$ subalgebra.
Mark J. Gotay, Hendrik Grundling
openaire +3 more sources
RPTQ: Reorder-based Post-training Quantization for Large Language Models [PDF]
Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be alleviated through quantization.
Zhihang Yuan+9 more
semanticscholar +1 more source
We associate to the action of a compact Lie group G on a line bundle over a compact oriented even-dimensional manifold a virtual representation of G using a twisted version of the signature operator. We obtain analogues of various theorems in the more standard theory of geometric quantization.
Guillemin, Victor+2 more
openaire +5 more sources
Post-Training Quantization on Diffusion Models [PDF]
Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the ...
Yuzhang Shang+4 more
semanticscholar +1 more source