Simd Library Documentation.

Home | Release Notes | Download | Documentation | Issues | GitHub

Add accelerated functions used in Synet Framework. More...

Functions

SIMD_API void * SimdSynetAdd16bInit (const size_t *aShape, size_t aCount, SimdTensorDataType aType, const size_t *bShape, size_t bCount, SimdTensorDataType bType, SimdTensorDataType dstType, SimdTensorFormatType format)
 Initializes element-wise addition of two tensors in FP32 or BF16 format. More...
 
SIMD_API void SimdSynetAdd16bForward (void *context, const uint8_t *a, const uint8_t *b, uint8_t *dst)
 Performs element-wise addition of two FP32/BF16 tensors. More...
 
SIMD_API void SimdSynetAddBias (const float *bias, size_t channels, size_t spatial, float *dst, SimdTensorFormatType format)
 Adds per-channel bias to an FP32 tensor in place. More...
 
SIMD_API void SimdSynetAdd8i (const uint8_t *aData, const float *aScale, const float *aShift, const uint8_t *bData, const float *bScale, const float *bShift, uint8_t *cData, const float *cScale, const float *cShift, size_t batch, size_t channels, size_t spatial, SimdTensorFormatType format, SimdSynetCompatibilityType compatibility)
 Dequantizes, adds and requantizes two UINT8 tensors. More...
 

Detailed Description

Add accelerated functions used in Synet Framework.

Function Documentation

◆ SimdSynetAdd16bInit()

void * SimdSynetAdd16bInit ( const size_t *  aShape,
size_t  aCount,
SimdTensorDataType  aType,
const size_t *  bShape,
size_t  bCount,
SimdTensorDataType  bType,
SimdTensorDataType  dstType,
SimdTensorFormatType  format 
)

Initializes element-wise addition of two tensors in FP32 or BF16 format.

The created context adds two tensors with equal shapes:

for(i = 0; i < shapeSize; ++i)
{
    A = aType == SimdTensorData16b ? BFloat16ToFloat32(a[i]) : a[i];
    B = bType == SimdTensorData16b ? BFloat16ToFloat32(b[i]) : b[i];
    D = A + B;
    dst[i] = dstType == SimdTensorData16b ? Float32ToBFloat16(D) : D;
}

The current implementation creates a context only for equal input shapes, FP32/BF16 input and output tensor types, and SimdTensorFormatUnknown, SimdTensorFormatNchw or SimdTensorFormatNhwc tensor format.

Parameters
[in]aShape- a pointer to shape of input A tensor.
[in]aCount- a count of dimensions of input A tensor.
[in]aType- a type of input A tensor. Can be FP32 or BF16.
[in]bShape- a pointer to shape of input B tensor.
[in]bCount- a count of dimensions of input B tensor.
[in]bType- a type of input B tensor. Can be FP32 or BF16.
[in]dstType- a type of output tensor. Can be FP32 or BF16.
[in]format- a format of input / output tensors.
Returns
a pointer to add context. On error it returns NULL. It must be released with using of function SimdRelease. This pointer is used in function SimdSynetAdd16bForward.

◆ SimdSynetAdd16bForward()

void SimdSynetAdd16bForward ( void *  context,
const uint8_t *  a,
const uint8_t *  b,
uint8_t *  dst 
)

Performs element-wise addition of two FP32/BF16 tensors.

The function adds corresponding elements of input tensors A and B using a context created by SimdSynetAdd16bInit. The actual data types, tensor shape and output type are stored in the context. BF16 input values are converted to FP32 before addition, and BF16 output values are converted from FP32 after addition.

Parameters
[in]context- a pointer to add context. It must be created by function SimdSynetAdd16bInit and released by function SimdRelease.
[in]a- a pointer to input A tensor.
[in]b- a pointer to input B tensor.
[out]dst- a pointer to output tensor.

◆ SimdSynetAddBias()

void SimdSynetAddBias ( const float *  bias,
size_t  channels,
size_t  spatial,
float *  dst,
SimdTensorFormatType  format 
)

Adds per-channel bias to an FP32 tensor in place.

Algorithm's details (example for NCHW tensor format):

for(c = 0; c < channels; ++c)
    for(s = 0; s < spatial; ++s)
         dst[c*spatial + s] += bias[c];

Algorithm's details (example for NHWC tensor format):

for(s = 0; s < spatial; ++s)
    for(c = 0; c < channels; ++c)
         dst[s*channels + c] += bias[c];
Note
This function is used in Synet Framework.
Parameters
[in]bias- a pointer to the 32-bit float array with bias coefficients. The size of the array is equal to channels.
[in]channels- a number of channels in the tensor.
[in]spatial- a spatial size (height * width) of the tensor.
[in,out]dst- a pointer to FP32 tensor updated in place. The size of the array is equal to channels * spatial.
[in]format- a format of the tensor.

◆ SimdSynetAdd8i()

void SimdSynetAdd8i ( const uint8_t *  aData,
const float *  aScale,
const float *  aShift,
const uint8_t *  bData,
const float *  bScale,
const float *  bShift,
uint8_t *  cData,
const float *  cScale,
const float *  cShift,
size_t  batch,
size_t  channels,
size_t  spatial,
SimdTensorFormatType  format,
SimdSynetCompatibilityType  compatibility 
)

Dequantizes, adds and requantizes two UINT8 tensors.

Algorithm's details (example for NCHW tensor format):

upper = isNarrowed(compatibility) ? 180 : 255;
for(b = 0; b < batch; ++b)
    for(c = 0; c < channels; ++c)
        for(s = 0; s < spatial; ++s)
        {
             offs = (b*channels + c)*spatial + s;
             A = aData[offs]*aScale[c] + aShift[c];
             B = bData[offs]*bScale[c] + bShift[c];
             C = round((A + B)*cScale[c] + cShift[c]);
             cData[offs] = restrict(C, 0, upper);
        }

For NHWC tensor format the same calculation uses offset (b*spatial + s)*channels + c.

Note
This function is used in Synet Framework.
Parameters
[in]aData- a pointer to the first input UINT8 tensor.
[in]aScale- a pointer to the 32-bit float array with per-channel scale coefficients of the first input tensor.
[in]aShift- a pointer to the 32-bit float array with per-channel shift coefficients of the first input tensor.
[in]bData- a pointer to the second input UINT8 tensor.
[in]bScale- a pointer to the 32-bit float array with per-channel scale coefficients of the second input tensor.
[in]bShift- a pointer to the 32-bit float array with per-channel shift coefficients of the second input tensor.
[out]cData- a pointer to the output UINT8 tensor.
[in]cScale- a pointer to the 32-bit float array with per-channel scale coefficients of the output tensor.
[in]cShift- a pointer to the 32-bit float array with per-channel shift coefficients of the output tensor.
[in]batch- a batch size of input and output tensors.
[in]channels- a number of channels in input and output tensors.
[in]spatial- a spatial size (height * width) of input and output tensors.
[in]format- a format of input and output tensors. Can be NCHW or NHWC.
[in]compatibility- calculation compatibility flags. When narrowed 8-bit mode is active, output is limited to [0, 180], otherwise to [0, 255].