Mixed-Precision Training

Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge

Overview

Introduction

Involves converting weights to lower-precision (FP16) for faster computation, calculating gradients, converting gradients back to higher-precision (FP32) for numerical stability, and updating the original weights with the scaled gradients

Automatic Mixed Precision Training

PyTorch Autocast / AMP

Float types

Typically use 32-bit float (pytorch default is 32 bit)
- Single precision floating point
- 64 bit double precision floating point not used in DL ⇒ too compute expensive, not GPU optimised
Fraction component is normally called significand or mantissa
- Related but not equivalent to the digits after decimal point
When use float 16, can lead to numeric overflow / underflow
- Overflow: exceeding the max possible value of float16 (65,504), resulting in an Inf
- Underflow: values between 0 and 5.9604645e-08 will end up resulting in 0
Bfloat16 extends the dynamic range compared to the conventional float16 format at the expense of decreased precision
- easier to represent very large and very small numbers vs float16
- originally developed for Google TPUs
- supported by many NVIDIA GPUs ⇒ check with torch.cuda.is_bf16_supported()

Darius Knowledge Hub

Explorer

Mixed-Precision Training

Mixed-Precision Training

Overview

Introduction

Automatic Mixed Precision Training

Float types

Theoretical References

Papers

Articles

Courses

Code References

Methods

Tools, Frameworks

Graph View

Table of Contents

Backlinks

Darius Knowledge Hub

Explorer

Mixed-Precision Training

Mixed-Precision Training

Overview

Related fields

Introduction

Automatic Mixed Precision Training

Float types

Theoretical References

Papers

Articles

Courses

Code References

Methods

Tools, Frameworks

Graph View

Table of Contents

Backlinks