Structural Pruning
Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge
Structural Pruning - random dataset / random models
- no need to dive into SOTA publications as most of them wont work
- start with more industry oriented stuff - some packages to take a look and see if it can keep as much performance as possible, theory is straightforward
- she’ll share w me something list of links - straightforward methods
- Iterative magnitude pruning (strong effective baseline)
- lottery ticket hypothesis
- Pruning Filters for Efficient ConvNets: arxiv greedy remove the least effective filters (slow but still SOTA structural pruning)
- magnitude pruning
- “One of the most well-known older works in this area prunes filters from a convnet using the L1 norm of the filter’s weights.”
- Iterative magnitude pruning (strong effective baseline)
- non straitforward stuff wont work
- find toolbox that can be imm applied
- stakeholders have no implementations alr there -
From Large Transformer Model Inference Optimization | Lil’Log
Magnitude pruning is simplest yet quite effective pruning method - weights with smallest absolute values are trimmed. In fact, some studies (Gale et al. 2019) found that simple magnitude pruning approaches can achieve comparable or better results than complicated pruning methods, such as variational dropout (Molchanov et al. 2017) and L0 regularization (Louizos et al. 2017). Magnitude pruning is simple to apply to large models and achieves reasonably consistent performance across a wide range of hyperparameters.
Zhu & Gupta (2017) found that large sparse models were able to achieve better performance than their small but dense counterparts. They proposed Gradual Magnitude Pruning (GMP) algorithm that increases the sparsity of a network gradually over the course of training. At each training step, weights with smallest absolute values are masked to be zeros to achieve a desired sparsity level S and masked weights do not get gradient update during back-propagation. The desired sparsity level S goes up with more training steps. The process of GMP is sensitive to the learning rate schedule, which should be higher than what’s used in dense network training, but not too high to prevent convergence.
Iterative pruning (Renda et al. 2020) iterates step 2 (prune) & step 3 (retrain) multiple times: Only a small fraction of weights are pruned and the model is retrained in each iteration. The process repeats until a desired sparsity level is reached.
Han et al. [2016b] popularized magnitude pruning for modern deep neural networks as part of neural network compression for inference. Song Han, Huizi Mao, and William J. Dally. 2016b. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. (2016). arXiv:cs.CV/1510.00149