Apache TVM

Created: 30 Dec 2022, 02:02 PM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge, tools, TinyML

Build anywhere, run anywhere

Optimising deep learning compiler

Deployment of TF/PyTorch/ONNX models onto any hardware platform

Arxiv Paper - TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend

From <https://tvm.apache.org/>

Is the backbone of octoml.ai

Import the model from a framework likeTensorflow,PyTorch, orOnnx. The importer layer is where TVM can ingest models from other frameworks, like Tensorflow, PyTorch, or ONNX. The level of support that TVM offers for each frontend varies as we are constantly improving the open source project. If you’re having issues importing your model into TVM, you may want to try converting it to ONNX.
Translate toRelay, TVM’s high-level model language. A model that has been imported into TVM is represented in Relay. Relay is a functional language and intermediate representation (IR) for neural networks. It has support for:
- Traditional data flow-style representations
- Functional-style scoping, let-binding which makes it a fully featured differentiable language
- Ability to allow the user to mix the two programming styles Relay applies graph-level optimization passes to optimize the model.
Lower toTensor Expression(TE) representation. Lowering is when a higher-level representation is transformed into a lower-level representation. After applying the high-level optimizations, Relay runs FuseOps pass to partition the model into many small subgraphs and lowers the subgraphs to TE representation. Tensor Expression (TE) is a domain-specific language for describing tensor computations. TE also provides severalscheduleprimitives to specify low-level loop optimizations, such as tiling, vectorization, parallelization, unrolling, and fusion. To aid in the process of converting Relay representation into TE representation, TVM includes a Tensor Operator Inventory (TOPI) that has pre-defined templates of common tensor operators (e.g., conv2d, transpose).
Search for the best schedule using the auto-tuning moduleAutoTVMorAutoScheduler. A schedule specifies the low-level loop optimizations for an operator or subgraph defined in TE. Auto-tuning modules search for the best schedule and compare them with cost models and on-device measurements. There are two auto-tuning modules in TVM.
- AutoTVM: A template-based auto-tuning module. It runs search algorithms to find the best values for the tunable knobs in a user-defined template. For common operators, their templates are already provided in TOPI.
- AutoScheduler (a.k.a. Ansor): A template-free auto-tuning module. It does not require pre-defined schedule templates. Instead, it generates the search space automatically by analyzing the computation definition. It then searches for the best schedule in the generated search space.
Choose the optimal configurations for model compilation. After tuning, the auto-tuning module generates tuning records in JSON format. This step picks the best schedule for each subgraph.
Lower to Tensor Intermediate Representation (TIR), TVM’s low-level intermediate representation. After selecting the optimal configurations based on the tuning step, each TE subgraph is lowered to TIR and be optimized by low-level optimization passes. Next, the optimized TIR is lowered to the target compiler of the hardware platform. This is the final code generation phase to produce an optimized model that can be deployed into production. TVM supports several different compiler backends including:
- LLVM, which can target arbitrary microprocessor architecture including standard x86 and ARM processors, AMDGPU and NVPTX code generation, and any other platform supported by LLVM.
- Specialized compilers, such as NVCC, NVIDIA’s compiler.
- Embedded and specialized targets, which are implemented through TVM’s Bring Your Own Codegen (BYOC) framework.
Compile down to machine code. At the end of this process, the compiler-specific generated code can be lowered to machine code. TVM can compile models down to a linkable object module, which can then be run with a lightweight TVM runtime that provides C APIs to dynamically load the model, and entry points for other languages such as Python and Rust. TVM can also build a bundled deployment in which the runtime is combined with the model in a single package.

From <https://tvm.apache.org/docs/tutorial/introduction.html#sphx-glr-tutorial-introduction-py>

Tip: If you decide to try out TVM, use theirunofficial conda/pip commandfor fast installation instead of the instructions found on the Apache site (here and here). They only havea Discord serverif you need help!

From <https://huyenchip.com/2021/09/07/a-friendly-introduction-to-machine-learning-compilers-and-optimizers.html>

Michael from robotics side uses TVM!

ONNX Runtime

Darius Knowledge Hub

Explorer

Apache TVM

Apache TVM

Graph View

Backlinks