# Application Programming Interface¶

This document lists the application programming interface (API) that the generated C source code exported. With these APIs, applications can:

write input into,

read output from the neural network, and

start inference of the given neural network on microcontrollers.

## Data Structure¶

This section describes the input and output buffers that application writing in and reading from. In Tiny ONNC, the generated C code is responsible for memory space maintanence. That is, the model creates and frees all used tensors automatically. Users have no need to create and assign memory space of tensors.

### Input Tensor¶

So far Tiny ONNC support models with only single input tensor.

```
struct onnc_input_tensor_t { int8_t* data; size_t size; };
```

- data
the memory space of the input tensor

- size
the size of the input tensor

### Output Tensor¶

So far Tiny ONNC support models with only single output tensor.

```
struct onnc_output_tensor_t { int8_t* data; size_t size; };
```

- data
the memory space of the output tensor

- size
the size of the output tensor

## Standard Procedures¶

This section describes all the open API functions to application.

### Getting internal Input Tensor¶

**onnc_get_input_tensor** returns the internal input tensor of the model.

- example
const onnc_input_tensor_t int8_input = onnc_get_input_tensor();

- return
the input tensor in the model.

### Getting I/O Scaling Factors¶

**onnc_get_io_scaling_factors(float* quant, float* dequant)** writes
scaling factors that calculated used by calibrator for quantization and
de-quantization into given parameters.

For minimize degradation in model accuracy, ONNC Calibrator defines a scaling factor to adjust input distribution (quantized factor) and a scaling factor to adjust output distribution (de-quantized factor).

You can get the scaling factors by calling *onnc_get_io_scaling_factors*:

- example
float quantized_factor, dequantized_factor; onnc_get_io_scaling_factors(&quantized_factor, &dequantized_factor);

Application should transform the input from 32-bit floating points to 8-bit integers with the quantized factor.

for (unsigned int i = 0; i < int8_input.size; ++i) { int8_input.data[i] = (int8_t)(model_input[i] * quantized_factor); }

After inference, application should transform the output from 8-bit integers to 32-bit floating points with the de-quantized factor.

for (unsigned int i = 0; i < int8_output.size; ++i) { model_output[i] = (float)int8_output.data[i] * dequantized_factor; }

- quant: float*
The pointer to a float holding the scaling factor used in quantization.

- dequant: float*
The pointer to a float holding the scaling factor used in de-quantization.

### Blocking Inference and Getting the Output Tensor¶

**cortexm_main** does the inference and returns the output tensor.

- example
const onnc_output_tensor_t int8_output = cortexm_main();

- return
The output tensor in the model.