Lightweight AI, Embedded AI, Efficient AI
Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge
Overview
Related fields
- TinyML - efficient deep learning on edge & embedded devices
- Neural Network Compression
- Optimising GPU code
- Machine Learning Compilers (Graph Compilers)
- Efficient Transformers
- Mixture of Experts (MoE)
- Tensorflow Lite vs PyTorch Mobile vs TVM Runtime vs ONNX Runtime vs TensorRT
- Parallelism
- Mixed-Precision Training
- Parameter Efficient Fine-Tuning (PEFT)
Introduction
Theoretical References
Papers
Articles
Courses
- MIT 2022 - TinyML EfficientML Course (Prof Song Han)
- annotation of the pdf file MIT 2022 - TinyML EfficientML Course (Prof Song Han).pdf
- partial
- MIT 2023 - TinyML and Efficient Deep Learning Computing (Prof Song Han)
- annotation of the pdf file MIT 2023 - TinyML EfficientML Course (Prof Song Han).pdf
- in progress
Code References
Methods
Tools, Frameworks
-
- C++ implementation of DL algos
-
- C++ implementation of other algos
-
- fast, header-only C++ machine learning library
- a machine learning analog to LAPACK
-
- Inference engine for Microcontrollers and Embedded devices (C99)
- sklearn, Keras compatible
-
- Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies
-
- Generate C code for microcontrollers from Python’s sklearn classifiers
-
- C++/CUDA neural network framework
-
TIDL - TI deep learning used by Conti AM ADAS team
-
FFCV - drop-in data loading system that dramatically increases data throughput in model training
-
apple/ml-cvnets: CVNets: A library for training computer vision networks - apple CV networks e.g. MobileViT
-
ggerganov/ggml: Tensor library for machine learning
- ggml = GPT-Generated Model Language
- updated in August 2023 to GGUF = GPT-Generated Unified Format
- whisper.cpp - openai automatic speech recognition model local inference
- llama.cpp = meta llama llm local inference
-
DeepSpeed - Microsoft Research
- An open source deep learning optimization library for PyTorch
- DeepSpeed-Training
- ZeRO, 3D-Parallelism, DeepSpeed-MoE, ZeRO-Infinity
- DeepSpeed-Inference, [Blog]
- Parallelism technology: tensor, pipeline, expert and ZeRO-parallelism
- custom inference kernels, communication optimizations and heterogeneous memory technologies
- DeepSpeed-Compression, [Blog]
- ZeroQuant, XTC
-
FairScale - PyTorch extension library for high performance and large scale training
-
facebookresearch/d2go: D2Go is a toolkit for efficient deep learning
- end-to-end model training and deployment for mobile platforms
- deep learning toolkit powered by PyTorch and Detectron2
-
GitHub - hpcaitech/ColossalAI: Making large AI models cheaper, faster and more accessible
-
GitHub - pytorch/executorch: On-device AI across mobile, embedded and edge for PyTorch
-
GitHub - pytorch/ao: PyTorch native quantization and sparsity for training and inference
-
GitHub - VainF/Torch-Pruning: [CVPR 2023] DepGraph: Towards Any Structural Pruning
-
GitHub - daquexian/onnx-simplifier: Simplify your onnx model