C API
#####
This document lists the application programming interface (API) that the
generated C source code exported. With these APIs, applications can:

 - write input into the neural network,
 - start inference of the given neural network on microcontrollers, and
 - read output from the neural network

Inclusion of the `inference.h` header will make API defined visible.

.. code-block:: c

  #include "inference.h"

----

Data Structure
==============

Neural Network Model
--------------------
A ``onnc_model_t`` object represents a neural network model that generated by Tiny ONNC. You can get a model by `onnc_open_<output>_model() <#getting-neural-network-model>`_ function.

interface
  .. code-block:: c

    struct onnc_model_t;

Raw Tensor
----------
A ``onnc_raw_tensor_t`` object represents a tensor inside a model descriptor, such as feature maps, weight and bias.

interface
  .. code-block:: c

    struct onnc_raw_tensor_t { int8_t* data; size_t size; }

data
    the memory space of the tensor

size
    the size of the tensor

----

Standard Procedures
===================
This section describes all the open API functions to application.

Getting Neural Network Model
----------------------------
**onnc_open_<name>_model** returns a pointer to the nerual network.

interface
  .. code-block:: c

    struct onnc_model_t* onnc_open_<name>_model();

return
    The generated `neural network model`_

To support running multiple models in one application, Tiny ONNC distinguishes different models by getting them with differen function calls. The model name is embedded in the function name. For example, if you set the model name as **my_net**, then Tiny ONNC will produces a function named **onnc_open_my_net_model**.

Developers can use either command line or python API to set up the model name.
In command line tool, `-o` option sets a name to a model.

.. code-block:: sh

  onnc.cortexm -o <model_name>

In python API, the name is set in `onnc.bench.launch`.

.. code-block:: python

  onnc.bench.launch(name=<name>, device="m487");

Writting Inputs
---------------
**onnc_write_input** copies the given input array into a model and quantizes the value of array.

interface
  .. code-block:: c

    /**
     *  Write the input data into NN model
     * @retval -1 failure
     * @return the number of data written in the NN's input array. A given @ref pInput array may be larger than a NN model can afford. The function returns the number of elements of @ref pInput being moved inside the NN model.
     * @param[in,out] pModel the NN model
     * @param[in]     pInput the array of inputs
     * @param[in]     pSize  the size of the input array
     */
    int onnc_write_input(onnc_model_t* pModel, float pInput[], unsigned int pSize);

pModel: onnc_model_t*
  The neural network model. Getting from *onnc_open_<name>_model*.

pInput: float*
  The input array. This function will copy values of the array into its internal raw tensors and do quantization.

pSize: unsigned int
  The size of the input array.

return
  The number of data written into the raw input tensor. Return -1 when an error occurs.

Model Inference
---------------
**onnc_inference** does the neural network inference, and returns the raw output tensor.

In case you don't need to de-quantize the output value, you can operate raw output tensor directly.
For example, if the last layer is monotonic and all you want is just to find out the biggest element, then you can operate on raw tensor directly.

interface
  .. code-block:: c

    /**
     * Do inference on the @ref pModel
     * @retval -1 failure
     * @return The raw data of the output. In case users don't need to de-quantize the output value.
     * @param[in]  pModel The NN model.
     */
    onnc_raw_tensor_t* onnc_inference(onnc_model_t* pModel);

pModel: onnc_model_t*
  The neural network model. Getting from *onnc_open_<name>_model*.

return
  The raw tensor of the output. Return null pointer when an error occurs.

Reading Outputs
---------------
**onnc_read_output** de-quantizes the output data and copies them into the given output array.

interface
  .. code-block:: c

    /**
     * Read the output value from the NN model @ref pModel to the output array @ref pOutput
     * @retval -1 failure
     * @return The number of data moved into the @ref pOutput array. A given @ref pOutput array may be larger than a NN model's output. The function returns the number of elements that moved to @ref pOutput array.
     * @param[in]  pModel  the NN model.
     * @param[out] pOutput the output array.
     * @param[in]  pSize   the size of the @ref pOutput array.
     */
    int onnc_read_output(onnc_model_t* pModel, float pOutput[], unsigned int pSize);

pModel: onnc_model_t*
  The neural network model. Getting from *onnc_open_<name>_model*.

pOutput: float*
  The output array. This function will de-quantize the raw output tensor and copy it into the given output array.

pSize: unsigned int
  The size of the output array.

return
  The number of data written into the output array. Return -1 when an error occurs.

Release The Model
-----------------
**onnc_close_model** releases the memory space inside a model.

interface
  .. code-block:: c

    /**
     * Release the memory of the NN model @ref pModel
     * @retval -1 failure
     * @retval 0  success
     * @param[in,out] pModel The NN model.
     */
    int onnc_close(onnc_model_t* pModel);

pModel: onnc_model_t*
  The neural network model. Getting from *onnc_open_<name>_model*.

return
  Return 0 when the closing is successful. Otherwise, return -1 when an error occurs.

Example
=======
Here is an example to use C API.
  .. code-block:: c
    
    #include "inference.h"
    #include "XYZ_model.h"
    #include <stdlib.h>
    
    int main(int pArgc, char* pArgv[])
    {
      struct onnc_model_t* model = onnc_open_XYZ_model();
    
      unsigned int num_inputs = 10; // magic number
      float* input_array = (float*)malloc(sizeof(float)*num_inputs);
    
      if (-1 == onnc_write_input(model, input_array, num_inputs)) {
        return EXIT_FAILURE;
      }
    
      onnc_raw_tensor_t* output = onnc_inference(model); // raw data of the tensor
      if (nullptr == output) {
        return EXIT_FAILURE;
      }
    
      unsigned int num_outputs = 5; // magic number
      float* output_array = (float*)malloc(sizeof(float)*num_outputs);
      if (-1 == onnc_read_output(model, output_array, num_outputs)) {
        return EXIT_FAILURE;
      }
    
      if (-1 == onnc_close(model)) {
        return EXIT_FAILURE;
      }
      return EXIT_SUCCESS;
    }