flowchart LR
A["Relay IR"] --> B["Annotation &\nPartitioning"]
B --> C["ncnn Codegen\n(C++)"]
C --> D["ncnn Runtime"]
D --> E["Raspberry Pi 4B"]
style A fill:#6c5ce7,color:#fff
style B fill:#0984e3,color:#fff
style C fill:#00b894,color:#fff
style D fill:#fdcb6e,color:#333
style E fill:#e17055,color:#fff
Integrating ncnn with Apache TVM
ML Compilation
Edge AI
C++
Integrated Tencent’s ncnn inference engine into Apache TVM via BYOC, achieving 30% speedup over Arm Compute Library on Raspberry Pi 4B.
A project course where I explored machine learning compilation by integrating ncnn — a high-performance inference engine for mobile — into Apache TVM using its BYOC (Bring Your Own Codegen) framework.
Repository: digital-nomad-cheng/Integrate_ncnn_with_TVM
📄 Report: Project Course Report (PDF)
Key Achievements
- Studied ML compilation theory — Relay IR, graph-level optimisations, and the TVM compilation pipeline.
- Deep-dived into TVM BYOC — understood the annotation, partitioning, codegen, and runtime interfaces for plugging in external backends.
- Generated efficient CUDA kernels using TVM’s auto-scheduling and tuning infrastructure.
- Benchmarked multiple TVM codegens — compared execution time, memory footprint, and power consumption across backends.
- Integrated ncnn as a new TVM backend via BYOC, implementing codegen and runtime modules in C++ for relay pattern matching, layer fusion, and memory-optimised dispatch.
- Achieved ~31% speedup over Arm Compute Library on Raspberry Pi 4B (AlexNet, 227×227, 100 runs: ncnn 8.5s vs ACL 12.5s).
Implementation Details
The integration follows TVM’s BYOC pattern:
- Pattern matching — Relay graph patterns (e.g.
conv2d + bias_add + relu) are annotated and partitioned for offloading to ncnn. - Codegen — A custom C++ codegen translates partitioned Relay subgraphs into ncnn layer configurations.
- Runtime — A TVM-compatible runtime module allocates tensors at init time (not per-run) and dispatches computation to ncnn, reducing memory traffic.
Supported Layer Fusions
| Composite Pattern | Status |
|---|---|
nn.dense + bias_add |
✅ |
nn.dense + bias_add + relu |
✅ |
nn.conv2d + bias_add + relu |
✅ |
reshape |
✅ |
References
| Resource | Description |
|---|---|
| Apache TVM | Open deep learning compiler stack |
| TVM BYOC Documentation | Official guide for integrating external codegens |
| ncnn | Tencent’s high-performance neural network inference engine for mobile |
Tech Stack
C++ · Python · Apache TVM (v0.13.0) · ncnn · Docker · Raspberry Pi 4B