Machine Learning Compilers (Graph Compilers)

Created: 03 Jan 2023, 12:49 PM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge, tools, TinyML

Overview

Tensorflow Lite vs PyTorch Mobile vs TVM Runtime vs ONNX Runtime vs TensorRT

Introduction

Transclude of Machine-Learning-Compilers-(Graph-Compilers)-2025-10-30-11.23.21.excalidraw

Youtube Video, Blogpost

As more and more companies want to bring ML to the edge, and more and more hardware is being developed for ML models, more and more compilers are being developed to bridge the gap between ML models and hardware accelerators---MLIR dialects, Apache TVM, XLA, PyTorch Glow, cuDNN, etc..

Intermediate representation (IR) as middle man

High-level IRs and low-level IRs.

High level IRs includes MLIR / TVM

MLIR

Modular: What about the MLIR compiler infrastructure? (Democratizing AI Compute, Part 8)
MLIR dialects—a way to cleanly separate domain-specific concerns from the core infrastructure of a compiler
MLIR would let compiler engineers define their own representations—custom ops, types, and semantics—tailored to their domain.
more of a “compiler infrastructure” than a compiler

TVM

TVM was designed for “TradAI”: a set of relatively simple operators that needed fusion, but GenAI has large and complex algorithms deeply integrated with the hardware – things like FlashAttention3. Modular: Democratizing AI Compute, Part 6: What about AI compilers (TVM and XLA)?

“lowering”

Python can’t run on a GPU. To bridge this gap, researchers build Embedded Domain-Specific Languages (eDSLs) e.g. OpenAI Triton.

An “eDSL” is a DSL that re-uses an existing language’s syntax—but changes how the code works with compiler techniques.
eDSLs work their magic by capturing Python code before it runs and transforming it into a form they can process. They typically leverage decorators, a Python feature that intercepts functions before they run.

IREE

Optimization through IREE Compiler

IREE (Intermediate Representation Execution Environment) is an MLIR-based end-to-end AI/ML compiler and runtime. The Architecture Overview is shown in figure4. In IREE, the input model is lowered to MLIR and then different levels of optimizations are applied (such as kernel fusion, tiling, and loop unrolling) and finally translated to target-dependent VM Bytecode. The VM Bytecode is able to execute with IREE runtime.

Theoretical References

Papers

Articles

Courses

Tianqi Chen - Machine Learning Compilation
- Videos 1 Videos 2
- 1. Introduction — Machine Learing Compiler 0.0.1 documentation

Code References

Methods

Tools, Frameworks

Apache TVM
MLIR / LLVM?
MLC LLM
XLA
GLOW
flexflow/flexflow-train: Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training??

Darius Knowledge Hub

Explorer

Machine Learning Compilers (Graph Compilers)

Machine Learning Compilers (Graph Compilers)

Overview

Introduction

IREE

Theoretical References

Papers

Articles

Courses

Code References

Methods

Tools, Frameworks

Graph View

Table of Contents

Backlinks

Darius Knowledge Hub

Explorer

Machine Learning Compilers (Graph Compilers)

Machine Learning Compilers (Graph Compilers)

Overview

Related fields

Introduction

IREE

Theoretical References

Papers

Articles

Courses

Code References

Methods

Tools, Frameworks

Graph View

Table of Contents

Backlinks