Pruning

Created: 16 Jan 2023, 03:58 PM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge, TinyML

Overview

Multi-Task Learning Pruning

Introduction

Pruning or sparsification is a family of methods to produce sparse neural networks.

Structural Pruning vs Unstructured Pruning

Structured

changes the structure of neural networks by physically removing grouped parameters (per DepGraph / Torch-Pruning paper)
only store offsets of blocks or other elements arranged with a fixed structure need to be stored
remove complete filters and directly lead to efficient deep neural models without requiring specialized hardware for sparse structures.

Unstructured

conducts zeroing on partial weights without modification to the network structure (per DepGraph / Torch-Pruning paper)
e.g. setting as zero see Rachit Singh - Deep learning model compression
the offset for each single element needs to be encoded to be stored
mask individual weights in the network
requires specialized hardware for sparse structures.

Model vs Ephemeral Pruning

Model
- sparsification or pruning permanently applied to the model
Ephemeral
- sparsification or pruning temporarily applied during computation

From Pruning neural networks without any data by iteratively conserving synaptic flow:

Pruning after training

Conventional pruning algorithms assign scores to parameters in neural networks after training and remove the parameters with the lowest scores [5, 23, 24].
Popular scoring metrics include weight magnitudes [4, 6], its generalization to multi-layers [25], first- [1, 26, 27, 28] and second-order [2, 3, 28] Taylor coefficients of the training loss with respect to the parameters, and more sophisticated variants [29, 30, 31].
While these pruning algorithms can indeed compress neural networks at test time, there is no reduction in the cost of training.

Pruning before training

Recent works demonstrated that randomly initialized neural networks can be pruned before training with little or no loss in the final test accuracy [10, 13, 32].
In particular, the Iterative Magnitude Pruning (IMP) algorithm [10, 11] repeats multiple cycles of training, pruning, and weight rewinding to identify extremely sparse neural networks at initialization that can be trained to match the test accuracy of the original network. While IMP is powerful, it requires multiple cycles of expensive training and pruning with very specific sets of hyperparameters.
Avoiding these difficulties, a different approach uses the gradients of the training loss at initialization to prune the network in a single-shot [13, 14].
While these single-shot pruning algorithms at initialization are much more efficient, and work as well as IMP at moderate levels of sparsity, they suffer from layer-collapse, or the premature pruning of an entire layer rendering a network untrainable [33, 34].
Understanding and circumventing this layer-collapse issue is the fundamental motivation for our study.

[10] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis - Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019.

Theoretical References

Papers

Sparsity in Deep Learning - Pruning and Growth for Efficient Inference and Training in Neural Networks - 2021 survey paper
Wanda - A Simple and Effective Pruning Approach for Large Language Models
- approach prunes weights with the smallest magnitudes multiplied by the corresponding input activations, on a per-output basis
- requires no retraining or weight update, and the pruned LLM can be used as is
- outperforms the established baseline of magnitude pruning and performs competitively against recent method involving intensive weight update
- To read!
Rethinking the Value of Network Pruning
- shows that for structured pruning, training the pruned model from scratch can almost always achieve comparable or higher level of accuracy than the model obtained from the typical “training, pruning and fine-tuning” (Fig. 1) procedure

Articles

Rachit Singh - Deep learning model compression
Large Transformer Model Inference Optimization | Lil’Log
- article on optimization including pruning

Darius Knowledge Hub

Explorer

Pruning

Pruning

Overview

Introduction

Structural Pruning vs Unstructured Pruning

Theoretical References

Papers

Articles

Courses

Code References

Methods

Tools, Frameworks

Graph View

Table of Contents

Backlinks

Darius Knowledge Hub

Explorer

Pruning

Pruning

Overview

Related fields

Introduction

Structural Pruning vs Unstructured Pruning

Theoretical References

Papers

Articles

Courses

Code References

Methods

Tools, Frameworks

Graph View

Table of Contents

Backlinks