Machine Learning Compilers (Graph Compilers)
Created: 03 Jan 2023, 12:49 PM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a")
Tags: knowledge, tools, TinyML
Overview
Related fields
Introduction
Transclude of Machine-Learning-Compilers-(Graph-Compilers)-2025-10-30-11.23.21.excalidraw
As more and more companies want to bring ML to the edge, and more and more hardware is being developed for ML models, more and more compilers are being developed to bridge the gap between ML models and hardware accelerators---MLIR dialects, Apache TVM, XLA, PyTorch Glow, cuDNN, etc..
Intermediate representation (IR) as middle man

High-level IRs and low-level IRs.

High level IRs includes MLIR / TVM
MLIR
- Modular: What about the MLIR compiler infrastructure? (Democratizing AI Compute, Part 8)
- MLIR dialects—a way to cleanly separate domain-specific concerns from the core infrastructure of a compiler
- MLIR would let compiler engineers define their own representations—custom ops, types, and semantics—tailored to their domain.
- more of a “compiler infrastructure” than a compiler
TVM

- TVM was designed for “TradAI”: a set of relatively simple operators that needed fusion, but GenAI has large and complex algorithms deeply integrated with the hardware – things like FlashAttention3. Modular: Democratizing AI Compute, Part 6: What about AI compilers (TVM and XLA)?
“lowering”
Python can’t run on a GPU. To bridge this gap, researchers build Embedded Domain-Specific Languages (eDSLs) e.g. OpenAI Triton.
- An “eDSL” is a DSL that re-uses an existing language’s syntax—but changes how the code works with compiler techniques.
- eDSLs work their magic by capturing Python code before it runs and transforming it into a form they can process. They typically leverage decorators, a Python feature that intercepts functions before they run.
IREE
Optimization through IREE Compiler
IREE (Intermediate Representation Execution Environment) is an MLIR-based end-to-end AI/ML compiler and runtime. The Architecture Overview is shown in figure4. In IREE, the input model is lowered to MLIR and then different levels of optimizations are applied (such as kernel fusion, tiling, and loop unrolling) and finally translated to target-dependent VM Bytecode. The VM Bytecode is able to execute with IREE runtime.
Theoretical References
Papers
Articles
- Chip Huyen - A friendly introduction to machine learning compilers and optimizers
- DeciAI - Graph Compilers for Deep Learning: Definition, Pros & Cons, and Popular Examples
- Pete Warden - Why are ML Compilers so Hard?
- Jan 2024 - Compilers: Talking to The Hardware
- Modular: DeepSeek’s Impact on AI (Democratizing AI Compute, Part 1)