FFCV


Created: 15 Dec 2022, 10:58 AM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge, tools


Speed up data loading

https://github.com/libffcv/ffcv

https://ffcv.io/

ffcv is a drop-in data loading system that dramatically increases data throughput in model training.

Replaces PyTorch data loader (torch.utils.data.Dataloader)

Plug-and-play with any existing training code: Rather than changing aspects of model training itself, FFCV focuses on removing data bottlenecks, which turn out to be a problem everywhere from neural network training to linear regression. This means that:

  • FFCV can be introduced into any existing training code in just a few lines of code (e.g., just swapping out the data loader and optionally the augmentation pipeline);
  • You don’t have to change the model itself to make it faster (e.g., feel free to analyze models without CutMix, Dropout, momentum scheduling, etc.);
  • FFCV can speed up a lot more beyond just neural network training---in fact, the more data-bottlenecked the application (e.g., linear regression, bulk inference, etc.), the faster FFCV will make it!