Results 11 to 20 of about 249,885 (208)

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [PDF]

open access: yes2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017
The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes.
Benoit Jacob   +7 more
semanticscholar   +1 more source

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning [PDF]

open access: yesNeural Information Processing Systems, 2022
We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of ...
Elias Frantar, Dan Alistarh
semanticscholar   +1 more source

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

open access: yesarXiv.org, 2023
Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth). In this paper, we propose Activation-
Ji Lin   +5 more
semanticscholar   +1 more source

A Survey of Quantization Methods for Efficient Neural Network Inference [PDF]

open access: yesLow-Power Computer Vision, 2021
As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose.
A. Gholami   +5 more
semanticscholar   +1 more source

PTQD: Accurate Post-Training Quantization for Diffusion Models [PDF]

open access: yesNeural Information Processing Systems, 2023
Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications.
Yefei He   +5 more
semanticscholar   +1 more source

Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization [PDF]

open access: yesNeural Information Processing Systems, 2023
Large language models (LLMs) face the challenges in fine-tuning and deployment due to their high memory demands and computational costs. While parameter-efficient fine-tuning (PEFT) methods aim to reduce the memory usage of the optimizer state during ...
Jeonghoon Kim   +6 more
semanticscholar   +1 more source

On quantizing [PDF]

open access: yesReports on Mathematical Physics, 1997
In this paper we continue our study of Groenewold-Van Hove obstructions to quantization. We show that there exists such an obstruction to quantizing the cylinder $T^*S^1.$ More precisely, we prove that there is no quantization of the Poisson algebra of $T^*S^1$ which is irreducible on a naturally defined $e(2) \times R$ subalgebra.
Mark J. Gotay, Hendrik Grundling
openaire   +3 more sources

RPTQ: Reorder-based Post-training Quantization for Large Language Models [PDF]

open access: yesarXiv.org, 2023
Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be alleviated through quantization.
Zhihang Yuan   +9 more
semanticscholar   +1 more source

Signature quantization [PDF]

open access: yesProceedings of the National Academy of Sciences, 2003
We associate to the action of a compact Lie group G on a line bundle over a compact oriented even-dimensional manifold a virtual representation of G using a twisted version of the signature operator. We obtain analogues of various theorems in the more standard theory of geometric quantization.
Guillemin, Victor   +2 more
openaire   +5 more sources

Post-Training Quantization on Diffusion Models [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the ...
Yuzhang Shang   +4 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy