Dynamic Quantization


Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge, TinyML


PyTorch Dynamic Quantization - Lei Mao’s Log Book

Dynamic quantization quantize the weights of neural networks to integers, but the activations are dynamically quantized during inference. Comparing to floating point neural networks, the size of dynamic quantized model is much smaller since the weights are stored as low-bitwidth integers. Comparing to other quantization techniques, dynamic quantization does not require any data for calibration or fine-tuning. More details about the mathematical foundations of quantization for neural networks could be found in my article “Quantization for Neural Networks”.

Post training Dynamic Quantization: the range for each activation is computed on the fly atruntime. While this gives great results without too much work, it can be a bit slower than static quantization because of the overhead introduced by computing the range each time. It is also not an option on certain hardware.

*From <https://huggingface.co/docs/optimum/concept_guides/quantization#calibration>