Pruning


Created: 16 Jan 2023, 03:58 PM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge, TinyML


Overview

Introduction

  • Pruning or sparsification is a family of methods to produce sparse neural networks.

Structural Pruning vs Unstructured Pruning

Structured

  • changes the structure of neural networks by physically removing grouped parameters (per DepGraph / Torch-Pruning paper)
  • only store offsets of blocks or other elements arranged with a fixed structure need to be stored
  • remove complete filters and directly lead to efficient deep neural models without requiring specialized hardware for sparse structures.

Unstructured

  • conducts zeroing on partial weights without modification to the network structure (per DepGraph / Torch-Pruning paper)
  • e.g. setting as zero see Rachit Singh - Deep learning model compression
  • the offset for each single element needs to be encoded to be stored
  • mask individual weights in the network
  • requires specialized hardware for sparse structures.

Model vs Ephemeral Pruning

  • Model
    • sparsification or pruning permanently applied to the model
  • Ephemeral
    • sparsification or pruning temporarily applied during computation

From Pruning neural networks without any data by iteratively conserving synaptic flow:

Pruning after training

  • Conventional pruning algorithms assign scores to parameters in neural networks after training and remove the parameters with the lowest scores [5, 23, 24].
  • Popular scoring metrics include weight magnitudes [4, 6], its generalization to multi-layers [25], first- [1, 26, 27, 28] and second-order [2, 3, 28] Taylor coefficients of the training loss with respect to the parameters, and more sophisticated variants [29, 30, 31].
  • While these pruning algorithms can indeed compress neural networks at test time, there is no reduction in the cost of training.

Pruning before training

  • Recent works demonstrated that randomly initialized neural networks can be pruned before training with little or no loss in the final test accuracy [10, 13, 32].
  • In particular, the Iterative Magnitude Pruning (IMP) algorithm [10, 11] repeats multiple cycles of training, pruning, and weight rewinding to identify extremely sparse neural networks at initialization that can be trained to match the test accuracy of the original network. While IMP is powerful, it requires multiple cycles of expensive training and pruning with very specific sets of hyperparameters.
  • Avoiding these difficulties, a different approach uses the gradients of the training loss at initialization to prune the network in a single-shot [13, 14].
  • While these single-shot pruning algorithms at initialization are much more efficient, and work as well as IMP at moderate levels of sparsity, they suffer from layer-collapse, or the premature pruning of an entire layer rendering a network untrainable [33, 34].
  • Understanding and circumventing this layer-collapse issue is the fundamental motivation for our study.

[10] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis - Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019.


Theoretical References

Papers

Articles

Courses


Code References

Methods

Tools, Frameworks