A Simple and Effective Pruning Approach for Large Language Models [PDF]
As their size increases, Large Languages Models (LLMs) are natural candidates for network pruning methods: approaches that drop a subset of network weights while striving to preserve performance.
Mingjie Sun+3 more
semanticscholar +1 more source
LLM-Pruner: On the Structural Pruning of Large Language Models [PDF]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in both the deployment ...
Xinyin Ma, Gongfan Fang, Xinchao Wang
semanticscholar +1 more source
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning [PDF]
The popularity of LLaMA (Touvron et al., 2023a;b) and other recently emerged moderate-sized large language models (LLMs) highlights the potential of building smaller yet powerful LLMs.
Mengzhou Xia+3 more
semanticscholar +1 more source
DepGraph: Towards Any Structural Pruning [PDF]
Structural pruning enables model acceleration by removing structurally-grouped parameters from neural networks. However, the parameter-grouping patterns vary widely across different models, making architecture-specific pruners, which rely on manually ...
Gongfan Fang+4 more
semanticscholar +1 more source
Beyond neural scaling laws: beating power law scaling via data pruning [PDF]
Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning.
Ben Sorscher+4 more
semanticscholar +1 more source
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning [PDF]
We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of ...
Elias Frantar, Dan Alistarh
semanticscholar +1 more source
Structural Pruning for Diffusion Models [PDF]
Generative modeling has recently undergone remarkable advancements, primarily propelled by the transformative implications of Diffusion Probabilistic Models (DPMs).
Gongfan Fang, Xinyin Ma, Xinchao Wang
semanticscholar +1 more source
A Survey on Deep Neural Network Pruning: Taxonomy, Comparison, Analysis, and Recommendations [PDF]
Modern deep neural networks, particularly recent large language models, come with massive model sizes that require significant computational and storage resources.
Hongrong Cheng+2 more
semanticscholar +1 more source
Structured Pruning for Deep Convolutional Neural Networks: A Survey [PDF]
The remarkable performance of deep Convolutional neural networks (CNNs) is generally attributed to their deeper and wider architectures, which can come with significant computational costs.
Yang He, Lingao Xiao
semanticscholar +1 more source
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity [PDF]
Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size.
Lu Yin+9 more
semanticscholar +1 more source