Getting Started

Tiny ONNC is an MLIR-based compiler exporting deep neural networks (DNN) into function calls to various neural network libraries, such as ARM CMSIS-NN and Andes LibNN. MLIR is a high-quality compiler framework addressing software fragmentation issues. By supporting variant Intermediate Representations in a single infrastructure, compilers can transform variant input languages into a common output form. Tiny ONNC leverages the unique power of MLIR to support rich neural network frameworks, including PyTorch, Tensorflow, TensorflowLite, Open Neural Network Exchange Format (ONNX) and even TVM relay. Tiny ONNC transforms all the input DNN formats into a function composed of a series of function calls to neural network libraries. One fits all, MLIR makes it possible.

In this tutorial, we introduce you to a new web service - ONNC Bench - to operate Tiny ONNC and simplify the deployment of a neural network model. Below are the features of ONNC Bench:

  • Framework Integration: ONNC Bench supports below deep learning frameworks:

    • Pytorch

    • Tensorflow, Keras

    • ONNX

    • … and more, please check Supprted Frameworks for the list of supported frameworks.

  • Model Optimization: TinyONNC utilizes effective quantization technologies that reduce size of deep learning modells while keep it accurate.

  • Intuitive Web Interface: Web interface is used to track, debug and manage your experiments and deployments.

Ok, now it is time to build a new model!

Installation

ONNC Bench requires Python 3.6 or above. Using virtual environment managers, such as Conda or VirtualEnv, is highly recommanded. Please check Installation for installation of virtual environment managers.

To install ONNC Bench, Simply type:

pip install onnc-bench

If you have onnc-bench installed, please use below command to upgrade to the latest version:

pip install onnc-bench --upgrade

Now, you should be able to import ONNC Bench in a python shell

>>> from onnc.bench import __version__
>>> print(__version__)
4.1.5

Acquire Pre-trained Models

Our first task in this tutorial is to compile a pre-trained model for visual wake words from TinyMLPerf.

Visual Wake Words model identifies whether a person is present in the image or not. It is commonly used in microcontrollers due to its compact memory footprint and computation resources.

The original Visual Wake Words model is placed in the TinyMLPerf repository at:

https://github.com/mlcommons/tiny/tree/master/benchmark/training

Alternatively, you can download model file and calibration samples for running this tutorial.

model.h5 is a Keras model in H5 format. We need calibration samples for hardware-aware quantization. Please check Quantization and Calibration for more information about quantization samples.

Register An Account

Before we start the compilation, we need a ONNC API key. We can get the FREE key from: https://app.onnc.skymizer.com. Your API Key could be found in the Profile section.

Compile the Model

Here is our first code snippet

from onnc.bench import login, Project

login("YOUR_EMAIL", "YOUR_PASSWD")

project = Project(name='experiment-1')
project.add_model(model='model.h5', samples='coco_11x96x96x3.npy')
project.compile(target='CMSIS-NN-DEFAULT')
deployment = project.save('./output')

Where,

Line Descriptions
3 Set the ONNC API key
5 Create a Project object. name is used as an ID of a Project to track and manage your experiments and deployments
6 Add a model and its corresponding calibration samples.
7 Compile the model according to given backend
8 Save the compiled model to output forlder. The deployment consolidates the compilation results and logs.

We can find Python APIs for more details about APIs.

Supprted backends are listed in Supported Devices

Please check Examples for more examples.

Review the Compilation Results

Once you download the model and get a Deployment object, we can show the memory footprint (in bytes) of the compiled model.

# ...
# deployment = project.save('./output')

print(deployment.report)
{
    "sram_size": 64514,
    "flash_size": 212890
}

The compiled model is stored at output/build/model

$ ls output
build report.json

$ ls output/src
cortexm_runtime.cpp  cortexm_runtime.h  inference.cpp  inference.h  main.cpp  main_model.cpp  main_model.h  main_weight.h  onnc_internal.h

Integrate with your Application

There are two types of compiled models depends on the backends we choose:

  1. C Library

    • CMSIS-NN-DEFAULT

    • ANDES-LIBNN-DEFAULT

  2. Loadable

    • NVDLA

      • NVDLA-NV-SMALL-DEFAULT

      • NVDLA-NV-LARGE-DEFAULT

      • NVDLA-NV-FULL-DEFAULT

    • Intel

      • INTEL-OPENVINO-CPU-FP32

Given compiled model in C Library format, we can import this library in the application and feed the inputs(usually from the sensor(s) of a IOT device) to this model. Please check C API for more information.

For models in Loadable format, we can use ONNC Forest Runtime and make integration much eaiser.

ONNC Forest Runtime

ONNC Forest Runtime is a powerful deep learning runtime framework features with below functionalities:

  • Supporting heterogeneous multi-cores and multi-cards

  • Optimizing memory movement to improve inference performance

  • Using the same application code on different hardware architectures

Inference Workflow

There are five steps to run inference using ONNC Forest Runtime:

  1. Pre-requisites: Compile the models

  2. Launching Devices

  3. Loading Models

  4. Binding Inputs and Outputs

  5. Materializing Models

  6. Running Inference

Pre-requisites: Compile the model(s)

We choose OpenVino backend to demonstrate ONNC Forest Runtime. Let’s compile a new model as an example.

import tensorflow as tf
import numpy as np

from onnc.bench import login, Project, Deployment
from onnc.forest.core.runtime import Runtime
from onnc.forest import OpenvinoOptions, Options

# Prepare model and dataset

## Load a pretrained MobileNet
model = tf.keras.applications.MobileNet(weights="imagenet")

## Train or finetune the pretrained model to fit your task
## ... 
## YOUR TRAINING CODE
## ...

# Compile and optimize the given model

## Log in ONNC OASIS
login("YOUR_EMAIL", "YOUR_PASSWD")

## Upload and compile the model
project = Project('project-1')
project.add_model(model)
project.compile(target='INTEL-OPENVINO-CPU-FP32')

## Download and save the compiled model
deployment = project.save('output/')

The compilation procedure is similar to what we have done in above setion. A subtle difference is we do not need calibration samples when we choose ‘INTEL-OPENVINO-CPU-FP32’ backend.

Note for Pytorch users: Due to the nature of dynamic graph in Pytorch, the shapes of the input tensors of a model are unknown until samples are provided. Therefore, we have to specify the input shapes. In order to doing so, just use argument model_inputs when we add a model. For example,

...
project.add_model(model, model_inputs=[("input", [1, 3, 224, 224], float)])
...

In order to use OpenVino to run your model, you need to install OpenVino package, currently we support version 2022.1.0.

pip install openvino==2022.1.0

Step 1: Launching Devices

The first step is to launch and setup devices for running the model. Each backend requires different parameters and configuration to run appropriately. ONNC Forest provide default configuration that can be used in most scenarios, we can also manually modify and specify optimal parameters to improve performance.

OpenVino configurations are defined in the OpenvinoOptions class.

## options = OpenvinoOptions(loadable='path_to_a_openvino_model')

Or alternatively, we can use Options class, onnc-bench can automatically indentify supported formats.

## options = Options(loadable='path_to_a_openvino_model')

The constructor of Options or OpenvinoOptions takes a parameter loadable that specifies the path to a folder that contains weights (.bin) and network (.xml) files.

OpenvinoOptions can also take a Deployment object obtained from compilation procesdure.

deployment = project.save('output/')
options = Options(loadable=deployment.loadable)

or

deployment = Deployment('output/')
options = Options(loadable=deployment.loadable)

After we have the default configurations, we can use it to create a Runtime object and lauch coressponding devices.

runtime = Runtime()
runtime.launch(options)

Step 2: Loading Models

load(options: Options) method parse a compiled model(loadable) and resolve symbols in it. Loading full code/model is exceuted in materialize()(step 4). This is an optimization that avoids parsing and loading unnecessary code from loadables.

runtime.load(options)

Step 3. Binding Inputs and Outputs

In order to efficiently use memory and reduce unnecessary data movement, we need to allocate a memory space to store inputs and outputs tensors and bind them with the input and output descriptors in the model. For OpenVino backend, we recommand using bind_all_inputs and bind_all_outputs to allocate memory and bind all inputs and outputs.

runtime.bind_all_inputs()
runtime.bind_all_outputs()

To manually bind inputs and outputs, we can use bind_input(input, infer_request) and bind_output(output, infer_request).

The parameter input and output can be the names of input and output tensors, or indices.

The parameter infer_request is an OpenVino InferRequest object.

# Create an OpenVino InferRequest object using create_infer_request()
# infer_request = openvino.runtime.CompiledModel.create_infer_request()

runtime.bind_input('input_0', infer_request)
runtime.bind_output('output_0', infer_request)

or

# Create an OpenVino InferRequest object using create_infer_request()
# infer_request = openvino.runtime.CompiledModel.create_infer_request()

runtime.bind_input(0, infer_request)
runtime.bind_output(0, infer_request)

Step 4. Materializing Models

The full model is loaded and symbols are relocated after materialzation.

runtime.materialize()

Step 5. Running Inference

After materialzation, the runtime is ready for running inference.

Note: We have to preprocess the image(s) before we put it to device. The proprocessor and its parameters should be kept the same as in the traning stage.

runtime.write([img])
runtime.run()
res = runtime.read()

print(res)

Here we use write(samples: List[numpy.ndarray]) to write a Numpy tensor into binded memory space. And trigger inference with run(). The inference result can be retrieved by read()

Full example can be found here.


That’s it! We have successfully compiled and infered a model. These are all steps we need for integrate ONNC with AI applications. Enjoy your coding!