Quantization - Open Access .click

Results 11 to 20 of about 249,885 (208)

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [PDF]

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes.
Benoit Jacob +7 more
semanticscholar +1 more source

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning [PDF]

Neural Information Processing Systems, 2022
We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of ...
Elias Frantar, Dan Alistarh
semanticscholar +1 more source

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

arXiv.org, 2023
Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth). In this paper, we propose Activation-
Ji Lin +5 more
semanticscholar +1 more source

A Survey of Quantization Methods for Efficient Neural Network Inference [PDF]

Low-Power Computer Vision, 2021
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose.
A. Gholami +5 more
semanticscholar +1 more source

PTQD: Accurate Post-Training Quantization for Diffusion Models [PDF]

Neural Information Processing Systems, 2023
Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications.
Yefei He +5 more
semanticscholar +1 more source

Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization [PDF]

Neural Information Processing Systems, 2023
Large language models (LLMs) face the challenges in fine-tuning and deployment due to their high memory demands and computational costs. While parameter-efficient fine-tuning (PEFT) methods aim to reduce the memory usage of the optimizer state during ...
Jeonghoon Kim +6 more
semanticscholar +1 more source

On quantizing [PDF]

Reports on Mathematical Physics, 1997
In this paper we continue our study of Groenewold-Van Hove obstructions to quantization. We show that there exists such an obstruction to quantizing the cylinder $T^*S^1.$ More precisely, we prove that there is no quantization of the Poisson algebra of $T^*S^1$ which is irreducible on a naturally defined $e(2) \times R$ subalgebra.
Mark J. Gotay, Hendrik Grundling
openaire +3 more sources

RPTQ: Reorder-based Post-training Quantization for Large Language Models [PDF]

arXiv.org, 2023
Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be alleviated through quantization.
Zhihang Yuan +9 more
semanticscholar +1 more source

Signature quantization [PDF]

Proceedings of the National Academy of Sciences, 2003
We associate to the action of a compact Lie group G on a line bundle over a compact oriented even-dimensional manifold a virtual representation of G using a twisted version of the signature operator. We obtain analogues of various theorems in the more standard theory of geometric quantization.
Guillemin, Victor, Sternberg, Shlomo, Weitsman, Jonathan +2 more
openaire +5 more sources

Post-Training Quantization on Diffusion Models [PDF]

Computer Vision and Pattern Recognition, 2022
Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the ...
Yuzhang Shang +4 more
semanticscholar +1 more source

computer science
mathematics
physics

mathematical physics
quantum mechanics
quantization signal processing

algorithm
fos: physical sciences
theoretical physics