MIT 2023 - TinyML and Efficient Deep Learning Computing (Prof Song Han)
Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge
Related notes
Lightweight AI, Embedded AI, Efficient AI
Overview
- This course focuses on efficient machine learning and systems.
- This is a crucial area as deep neural networks demand extraordinary levels of computation, hindering its deployment on everyday devices and burdening the cloud infrastructure.
- This course introduces efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. (Lightweight AI, Embedded AI, Efficient AI)
- Topics include Neural Network Compression, Pruning, Quantization, Neural Architecture Search, Distributed Training, Data Parallelism / Model Parallelism, Gradient Compression, and On-device Fine-tuning Parameter Efficient Fine-Tuning (PEFT).
- It also introduces application-specific acceleration techniques for large language models and diffusion models.
Material
- Lecture Notes - MIT 2023 - TinyML EfficientML Course (Prof Song Han).pdf
- YouTube Lecture Recordings
- Course Website
- Course Labs Colab Notebooks:
0. Torch Tutorial
- can try emailing staff for the answers if needed - efficientml-staff[at]mit.edu
- Lab 5 Project - LLM Optimization for Laptops
- fast efficient inference engine in C++
- parallel programming using multithreading with SIMD, taking advantage of cache locality by using loop unrolling to remove branching overhead of system level hardcore techniques?
Other information
Chapter 0 - Introduction
EfficientDL - 1. Introduction
EfficientDL - 2. Basics of Deep Learning
Chapter I - Efficient Inference
EfficientDL - 3. Pruning and Sparsity (Part I)
EfficientDL - 4. Pruning and Sparsity (Part II)
EfficientDL - 5. Quantization (Part I)
EfficientDL - 6. Quantization (Part II)
7. Neural Architecture Search (Part I)
8. Neural Architecture Search (Part II)
9. Knowledge Distillation
- do this
10. MCUNet: TinyML on Microcontrollers
11. TinyEngine and Parallel Processing
- do this
Chapter II - Domain-Specific Optimization
12. Transformer and LLM (Part I)
13. Transformer and LLM (Part II)
14. Vision Transformer
15. GAN, Video, and Point Cloud
16. Diffusion Model
Chapter III - Efficient Training
17. Distributed Training (Part I)
18. Distributed Training (Part II)
19. On-Device Training and Transfer Learning
20. Efficient Fine-tuning and Prompt Engineering
Parameter Efficient Fine-Tuning (PEFT)