FFCV
Created: 15 Dec 2022, 10:58 AM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge, tools
Speed up data loading
https://github.com/libffcv/ffcv
ffcv is a drop-in data loading system that dramatically increases data throughput in model training.
Replaces PyTorch data loader (torch.utils.data.Dataloader)
Plug-and-play with any existing training code: Rather than changing aspects of model training itself, FFCV focuses on removing data bottlenecks, which turn out to be a problem everywhere from neural network training to linear regression. This means that:
- FFCV can be introduced into any existing training code in just a few lines of code (e.g., just swapping out the data loader and optionally the augmentation pipeline);
- You don’t have to change the model itself to make it faster (e.g., feel free to analyze models without CutMix, Dropout, momentum scheduling, etc.);
- FFCV can speed up a lot more beyond just neural network training---in fact, the more data-bottlenecked the application (e.g., linear regression, bulk inference, etc.), the faster FFCV will make it!