Application Programming Interface

This document lists the application programming interface (API) that the generated C source code exported. With these APIs, applications can:

  • write input into,

  • read output from the neural network, and

  • start inference of the given neural network on microcontrollers.


Data Structure

This section describes the input and output buffers that application writing in and reading from. In Tiny ONNC, the generated C code is responsible for memory space maintanence. That is, the model creates and frees all used tensors automatically. Users have no need to create and assign memory space of tensors.

Input Tensor

So far Tiny ONNC support models with only single input tensor.

struct onnc_input_tensor_t { int8_t* data; size_t size; };
data

the memory space of the input tensor

size

the size of the input tensor

Output Tensor

So far Tiny ONNC support models with only single output tensor.

struct onnc_output_tensor_t { int8_t* data; size_t size; };
data

the memory space of the output tensor

size

the size of the output tensor


Standard Procedures

This section describes all the open API functions to application.

Getting internal Input Tensor

onnc_get_input_tensor returns the internal input tensor of the model.

example
const onnc_input_tensor_t int8_input = onnc_get_input_tensor();
return

the input tensor in the model.

Getting I/O Scaling Factors

onnc_get_io_scaling_factors(float* quant, float* dequant) writes scaling factors that calculated used by calibrator for quantization and de-quantization into given parameters.

For minimize degradation in model accuracy, ONNC Calibrator defines a scaling factor to adjust input distribution (quantized factor) and a scaling factor to adjust output distribution (de-quantized factor).

You can get the scaling factors by calling onnc_get_io_scaling_factors:

example
float quantized_factor, dequantized_factor;
onnc_get_io_scaling_factors(&quantized_factor, &dequantized_factor);

Application should transform the input from 32-bit floating points to 8-bit integers with the quantized factor.

for (unsigned int i = 0; i < int8_input.size; ++i) {
  int8_input.data[i] = (int8_t)(model_input[i] * quantized_factor);
}

After inference, application should transform the output from 8-bit integers to 32-bit floating points with the de-quantized factor.

for (unsigned int i = 0; i < int8_output.size; ++i) {
  model_output[i] = (float)int8_output.data[i] * dequantized_factor;
}
quant: float*

The pointer to a float holding the scaling factor used in quantization.

dequant: float*

The pointer to a float holding the scaling factor used in de-quantization.

Blocking Inference and Getting the Output Tensor

cortexm_main does the inference and returns the output tensor.

example
const onnc_output_tensor_t int8_output = cortexm_main();
return

The output tensor in the model.