Rachit Singh - Deep learning model compression


Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags:

Annotations


  • One of the most well-known older works in this area prunes filters from a convnet usingthe L1 norm of the filter’s weights. * show annotation

Pruning Filters for Efficient ConvNets

  • Removing neurons or choosing a subnetwork is what I (and others) considerstructured pruning * show annotation

  • ocused on sparsifying model weights so that they are more compressible (what some callunstructured pruning) * show annotation

  • means the matrices are the same size, but some values are set to 0. * show annotation

  • The best method Iknow of is basically to reset the learning rate (learning rate rewinding) and start retraining thenetwork. * show annotation

https://arxiv.org/pdf/2003.02389.pdf

  • Quantization generally refers to taking a model with parameters trained at high precision (32 or 64bits) and reducing the number of bits that each weight takes (for example down to 16, 8, or evenfewer). * show annotation

Quantization

Quantization Aware Training (QAT)