MIT 2023 - TinyML and Efficient Deep Learning Computing (Prof Song Han)

Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge

Lightweight AI, Embedded AI, Efficient AI

Overview

This course focuses on efficient machine learning and systems.
This is a crucial area as deep neural networks demand extraordinary levels of computation, hindering its deployment on everyday devices and burdening the cloud infrastructure.
This course introduces efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. (Lightweight AI, Embedded AI, Efficient AI)
Topics include Neural Network Compression, Pruning, Quantization, Neural Architecture Search, Distributed Training, Data Parallelism / Model Parallelism, Gradient Compression, and On-device Fine-tuning Parameter Efficient Fine-Tuning (PEFT).
It also introduces application-specific acceleration techniques for large language models and diffusion models.

p.76

Material

Lecture Notes - MIT 2023 - TinyML EfficientML Course (Prof Song Han).pdf
YouTube Lecture Recordings
Course Website
Course Labs Colab Notebooks: 0. Torch Tutorial
- can try emailing staff for the answers if needed - efficientml-staff[at]mit.edu
Lab 5 Project - LLM Optimization for Laptops
- fast efficient inference engine in C++
- parallel programming using multithreading with SIMD, taking advantage of cache locality by using loop unrolling to remove branching overhead of system level hardcore techniques?

Other information

MIT 6.5940
Fall 2023
Instructors
- Song Han
- Han Cai (TA)
- Ji Lin (TA)

Chapter 0 - Introduction

EfficientDL - 1. Introduction

EfficientDL - 2. Basics of Deep Learning

Chapter I - Efficient Inference

p.78

EfficientDL - 3. Pruning and Sparsity (Part I)

EfficientDL - 4. Pruning and Sparsity (Part II)

EfficientDL - 5. Quantization (Part I)

EfficientDL - 6. Quantization (Part II)

7. Neural Architecture Search (Part I)

8. Neural Architecture Search (Part II)

9. Knowledge Distillation

Knowledge Distillation

do this

10. MCUNet: TinyML on Microcontrollers

11. TinyEngine and Parallel Processing

do this

Chapter II - Domain-Specific Optimization

p.79

15. GAN, Video, and Point Cloud

16. Diffusion Model

Chapter III - Efficient Training

p.80

17. Distributed Training (Part I)

Parallelism

18. Distributed Training (Part II)

Parallelism

19. On-Device Training and Transfer Learning

20. Efficient Fine-tuning and Prompt Engineering

Parameter Efficient Fine-Tuning (PEFT)

Darius Knowledge Hub

Explorer

MIT 2023 - TinyML and Efficient Deep Learning Computing (Prof Song Han)

MIT 2023 - TinyML and Efficient Deep Learning Computing (Prof Song Han)