Mixed-Precision Training
Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge
Overview
Related fields
Introduction

- Involves converting weights to lower-precision (FP16) for faster computation, calculating gradients, converting gradients back to higher-precision (FP32) for numerical stability, and updating the original weights with the scaled gradients
Automatic Mixed Precision Training
Float types

- Typically use 32-bit float (pytorch default is 32 bit)
- Single precision floating point
- 64 bit double precision floating point not used in DL ⇒ too compute expensive, not GPU optimised
- Fraction component is normally called significand or mantissa
- Related but not equivalent to the digits after decimal point
- When use float 16, can lead to numeric overflow / underflow
- Overflow: exceeding the max possible value of float16 (65,504), resulting in an Inf
- Underflow: values between 0 and 5.9604645e-08 will end up resulting in 0
- Bfloat16 extends the dynamic range compared to the conventional float16 format at the expense of decreased precision
- easier to represent very large and very small numbers vs float16
- originally developed for Google TPUs
- supported by many NVIDIA GPUs ⇒ check with
torch.cuda.is_bf16_supported()
Theoretical References
Papers
Articles
- Understanding Mixed Precision Training | by Jonathan Davis | Towards Data Science
- Train With Mixed Precision - NVIDIA Docs
- Accelerating Large Language Models with Mixed-Precision Techniques