# Getting Started [Tiny ONNC](https://skymizer.com/tinyonnc/) is an [MLIR-based](https://mlir.llvm.org/) compiler exporting deep neural networks (DNN) into function calls to various neural network libraries, such as ARM CMSIS-NN and Andes LibNN. MLIR is a high-quality compiler framework addressing software fragmentation issues. By supporting variant Intermediate Representations in a single infrastructure, compilers can transform variant input languages into a common output form. Tiny ONNC leverages the unique power of MLIR to support rich neural network frameworks, including PyTorch, Tensorflow, TensorflowLite, Open Neural Network Exchange Format (ONNX) and even TVM relay. Tiny ONNC transforms all the input DNN formats into a function composed of a series of function calls to neural network libraries. **One fits all**, MLIR makes it possible. In this tutorial, we introduce you to a new web service - **ONNC Bench** - to operate Tiny ONNC and simplify the deployment of a neural network model. Below are the features of **ONNC Bench**: * Framework Integration: **ONNC Bench** supports below deep learning frameworks: * Pytorch * Tensorflow, Keras * ONNX * ... and more, please check [Supprted Frameworks](https://docs-tinyonnc.skymizer.com/manual-supported-frameworks.html) for the list of supported frameworks. * Model Optimization: TinyONNC utilizes effective quantization technologies that reduce size of deep learning modells while keep it accurate. * Intuitive Web Interface: Web interface is used to track, debug and manage your experiments and deployments. Ok, now it is time to build a new model! ## Installation **ONNC Bench** requires **Python 3.6** or above. Using virtual environment managers, such as Conda or VirtualEnv, is highly recommanded. Please check [Installation](https://docs-tinyonnc.skymizer.com/tutorial-installation.html) for installation of virtual environment managers. To install **ONNC Bench**, Simply type: ``` pip install onnc-bench ``` If you have onnc-bench installed, please use below command to upgrade to the latest version: ``` pip install onnc-bench --upgrade ``` Now, you should be able to import **ONNC Bench** in a python shell ```python >>> from onnc.bench import __version__ >>> print(__version__) 4.1.5 ``` ## Acquire Pre-trained Models Our first task in this tutorial is to compile a pre-trained model for visual wake words from [TinyMLPerf](https://github.com/mlcommons/tiny/tree/master/v0.5). Visual Wake Words model identifies whether a person is present in the image or not. It is commonly used in microcontrollers due to its compact memory footprint and computation resources. The original Visual Wake Words model is placed in the TinyMLPerf repository at: ``` https://github.com/mlcommons/tiny/tree/master/benchmark/training ``` Alternatively, you can download [model file](https://docs-tinyonnc.skymizer.com/downloads/vww/model.h5) and [calibration samples](https://docs-tinyonnc.skymizer.com/downloads/vww/coco_11x96x96x3.npy) for running this tutorial. `model.h5` is a Keras model in `H5` format. We need **calibration samples** for hardware-aware quantization. Please check [Quantization and Calibration](https://docs-tinyonnc.skymizer.com/tutorial-calibration-samples.html) for more information about quantization samples. ## Register An Account Before we start the compilation, we need a ONNC API key. We can get the **FREE** key from: [https://app.onnc.skymizer.com](https://app.onnc.skymizer.com). Your API Key could be found in the `Profile` section. ## Compile the Model Here is our first code snippet ```python from onnc.bench import login, Project login("YOUR_EMAIL", "YOUR_PASSWD") project = Project(name='experiment-1') project.add_model(model='model.h5', samples='coco_11x96x96x3.npy') project.compile(target='CMSIS-NN-DEFAULT') deployment = project.save('./output') ``` Where, | Line | Descriptions | |---------|---------| |**3**| Set the ONNC API key| |**5**| Create a `Project` object. `name` is used as an ID of a Project to track and manage your experiments and deployments |**6**| Add a model and its corresponding calibration samples.| |**7**| Compile the model according to given backend| |**8**| Save the compiled model to output forlder. The `deployment` consolidates the compilation results and logs. We can find [Python APIs](https://docs-tinyonnc.skymizer.com/tutorial-python-apis.html) for more details about APIs. Supprted backends are listed in [Supported Devices](https://docs-tinyonnc.skymizer.com/manual-supported-devices.html) Please check [Examples](https://docs-tinyonnc.skymizer.com/tutorial-examples.html) for more examples. ## Review the Compilation Results Once you download the model and get a `Deployment` object, we can show the memory footprint (in bytes) of the compiled model. ```python # ... # deployment = project.save('./output') print(deployment.report) { "sram_size": 64514, "flash_size": 212890 } ``` The compiled model is stored at `output/build/model` ``` $ ls output build report.json $ ls output/src cortexm_runtime.cpp cortexm_runtime.h inference.cpp inference.h main.cpp main_model.cpp main_model.h main_weight.h onnc_internal.h ``` ## Integrate with your Application There are two types of compiled models depends on the backends we choose: 1. C Library * CMSIS-NN-DEFAULT * ANDES-LIBNN-DEFAULT 2. Loadable * NVDLA * NVDLA-NV-SMALL-DEFAULT * NVDLA-NV-LARGE-DEFAULT * NVDLA-NV-FULL-DEFAULT * Intel * INTEL-OPENVINO-CPU-FP32 Given compiled model in `C Library` format, we can import this library in the application and feed the inputs(usually from the sensor(s) of a IOT device) to this model. Please check [C API](https://docs-tinyonnc.skymizer.com/manual-C-API-200.html) for more information. For models in `Loadable` format, we can use ONNC Forest Runtime and make integration much eaiser. ## ONNC Forest Runtime ONNC Forest Runtime is a powerful deep learning runtime framework features with below functionalities: * Supporting heterogeneous multi-cores and multi-cards * Optimizing memory movement to improve inference performance * Using the same application code on different hardware architectures ### Inference Workflow There are five steps to run inference using ONNC Forest Runtime: 0. Pre-requisites: Compile the models 1. Launching Devices 2. Loading Models 3. Binding Inputs and Outputs 4. Materializing Models 5. Running Inference ### Pre-requisites: Compile the model(s) We choose OpenVino backend to demonstrate ONNC Forest Runtime. Let's compile a new model as an example. ``` import tensorflow as tf import numpy as np from onnc.bench import login, Project, Deployment from onnc.forest.core.runtime import Runtime from onnc.forest import OpenvinoOptions, Options # Prepare model and dataset ## Load a pretrained MobileNet model = tf.keras.applications.MobileNet(weights="imagenet") ## Train or finetune the pretrained model to fit your task ## ... ## YOUR TRAINING CODE ## ... # Compile and optimize the given model ## Log in ONNC OASIS login("YOUR_EMAIL", "YOUR_PASSWD") ## Upload and compile the model project = Project('project-1') project.add_model(model) project.compile(target='INTEL-OPENVINO-CPU-FP32') ## Download and save the compiled model deployment = project.save('output/') ``` The compilation procedure is similar to what we have done in above setion. A subtle difference is we do not need calibration samples when we choose 'INTEL-OPENVINO-CPU-FP32' backend. Note for Pytorch users: Due to the nature of dynamic graph in Pytorch, the shapes of the input tensors of a model are unknown until samples are provided. Therefore, we have to specify the input shapes. In order to doing so, just use argument `model_inputs` when we add a model. For example, ``` ... project.add_model(model, model_inputs=[("input", [1, 3, 224, 224], float)]) ... ``` In order to use OpenVino to run your model, you need to install OpenVino package, currently we support version 2022.1.0. ``` pip install openvino==2022.1.0 ``` ### Step 1: Launching Devices The first step is to launch and setup devices for running the model. Each backend requires different parameters and configuration to run appropriately. ONNC Forest provide default configuration that can be used in most scenarios, we can also manually modify and specify optimal parameters to improve performance. OpenVino configurations are defined in the `OpenvinoOptions` class. ``` ## options = OpenvinoOptions(loadable='path_to_a_openvino_model') ``` Or alternatively, we can use `Options` class, onnc-bench can automatically indentify supported formats. ``` ## options = Options(loadable='path_to_a_openvino_model') ``` The constructor of `Options` or `OpenvinoOptions` takes a parameter `loadable` that specifies the path to a folder that contains weights (.bin) and network (.xml) files. OpenvinoOptions can also take a `Deployment` object obtained from compilation procesdure. ``` deployment = project.save('output/') options = Options(loadable=deployment.loadable) ``` or ``` deployment = Deployment('output/') options = Options(loadable=deployment.loadable) ``` After we have the default configurations, we can use it to create a `Runtime` object and lauch coressponding devices. ``` runtime = Runtime() runtime.launch(options) ``` ### Step 2: Loading Models `load(options: Options)` method parse a compiled model(loadable) and resolve symbols in it. Loading full code/model is exceuted in `materialize()`(step 4). This is an optimization that avoids parsing and loading unnecessary code from loadables. ``` runtime.load(options) ``` ### Step 3. Binding Inputs and Outputs In order to efficiently use memory and reduce unnecessary data movement, we need to allocate a memory space to store inputs and outputs tensors and bind them with the input and output descriptors in the model. For OpenVino backend, we **recommand using `bind_all_inputs` and `bind_all_outputs`** to allocate memory and bind all inputs and outputs. ``` runtime.bind_all_inputs() runtime.bind_all_outputs() ``` To manually bind inputs and outputs, we can use `bind_input(input, infer_request)` and `bind_output(output, infer_request)`. The parameter `input` and `output` can be the **names** of input and output tensors, or **indices**. The parameter `infer_request` is an OpenVino InferRequest object. ``` # Create an OpenVino InferRequest object using create_infer_request() # infer_request = openvino.runtime.CompiledModel.create_infer_request() runtime.bind_input('input_0', infer_request) runtime.bind_output('output_0', infer_request) ``` or ``` # Create an OpenVino InferRequest object using create_infer_request() # infer_request = openvino.runtime.CompiledModel.create_infer_request() runtime.bind_input(0, infer_request) runtime.bind_output(0, infer_request) ``` ### Step 4. Materializing Models The full model is loaded and symbols are relocated after materialzation. ``` runtime.materialize() ``` ### Step 5. Running Inference After materialzation, the runtime is ready for running inference. Note: We have to preprocess the image(s) before we put it to device. The proprocessor and its parameters should be kept the same as in the traning stage. ``` runtime.write([img]) runtime.run() res = runtime.read() print(res) ``` Here we use `write(samples: List[numpy.ndarray])` to write a Numpy tensor into binded memory space. And trigger inference with `run()`. The inference result can be retrieved by `read()` Full example can be found [here](https://docs-tinyonnc.skymizer.com/tutorial-openvino.html). --- That's it! We have successfully compiled and infered a model. These are all steps we need for integrate ONNC with AI applications. Enjoy your coding!