Add accelerated functions used in Synet Framework. More...

Functions
SIMD_API void *	SimdSynetAdd16bInit (const size_t aShape, size_t aCount, SimdTensorDataType aType, const size_t bShape, size_t bCount, SimdTensorDataType bType, SimdTensorDataType dstType, SimdTensorFormatType format)
	Initializes element-wise addition of two tensors in FP32 or BF16 format. More...

SIMD_API void	SimdSynetAdd16bForward (void context, const uint8_t a, const uint8_t b, uint8_t dst)
	Performs element-wise addition of two FP32/BF16 tensors. More...

SIMD_API void	SimdSynetAddBias (const float bias, size_t channels, size_t spatial, float dst, SimdTensorFormatType format)
	Adds per-channel bias to an FP32 tensor in place. More...

SIMD_API void	SimdSynetAdd8i (const uint8_t aData, const float aScale, const float aShift, const uint8_t bData, const float bScale, const float bShift, uint8_t cData, const float cScale, const float *cShift, size_t batch, size_t channels, size_t spatial, SimdTensorFormatType format, SimdSynetCompatibilityType compatibility)
	Dequantizes, adds and requantizes two UINT8 tensors. More...

Detailed Description

Add accelerated functions used in Synet Framework.

Function Documentation

◆ SimdSynetAdd16bInit()

void * SimdSynetAdd16bInit	(	const size_t *	aShape,
		size_t	aCount,
		SimdTensorDataType	aType,
		const size_t *	bShape,
		size_t	bCount,
		SimdTensorDataType	bType,
		SimdTensorDataType	dstType,
		SimdTensorFormatType	format
	)

Initializes element-wise addition of two tensors in FP32 or BF16 format.

The created context adds two tensors with equal shapes:

for(i = 0; i < shapeSize; ++i)
{
    A = aType == SimdTensorData16b ? BFloat16ToFloat32(a[i]) : a[i];
    B = bType == SimdTensorData16b ? BFloat16ToFloat32(b[i]) : b[i];
    D = A + B;
    dst[i] = dstType == SimdTensorData16b ? Float32ToBFloat16(D) : D;
}

The current implementation creates a context only for equal input shapes, FP32/BF16 input and output tensor types, and SimdTensorFormatUnknown, SimdTensorFormatNchw or SimdTensorFormatNhwc tensor format.

Parameters

[in]	aShape	- a pointer to shape of input A tensor.
[in]	aCount	- a count of dimensions of input A tensor.
[in]	aType	- a type of input A tensor. Can be FP32 or BF16.
[in]	bShape	- a pointer to shape of input B tensor.
[in]	bCount	- a count of dimensions of input B tensor.
[in]	bType	- a type of input B tensor. Can be FP32 or BF16.
[in]	dstType	- a type of output tensor. Can be FP32 or BF16.
[in]	format	- a format of input / output tensors.

Returns: a pointer to add context. On error it returns NULL. It must be released with using of function SimdRelease. This pointer is used in function SimdSynetAdd16bForward.

◆ SimdSynetAdd16bForward()

void SimdSynetAdd16bForward	(	void *	context,
		const uint8_t *	a,
		const uint8_t *	b,
		uint8_t *	dst
	)

Performs element-wise addition of two FP32/BF16 tensors.

The function adds corresponding elements of input tensors A and B using a context created by SimdSynetAdd16bInit. The actual data types, tensor shape and output type are stored in the context. BF16 input values are converted to FP32 before addition, and BF16 output values are converted from FP32 after addition.

Parameters

[in]	context	- a pointer to add context. It must be created by function SimdSynetAdd16bInit and released by function SimdRelease.
[in]	a	- a pointer to input A tensor.
[in]	b	- a pointer to input B tensor.
[out]	dst	- a pointer to output tensor.

◆ SimdSynetAddBias()

void SimdSynetAddBias	(	const float *	bias,
		size_t	channels,
		size_t	spatial,
		float *	dst,
		SimdTensorFormatType	format
	)

Adds per-channel bias to an FP32 tensor in place.

Algorithm's details (example for NCHW tensor format):

for(c = 0; c < channels; ++c)
    for(s = 0; s < spatial; ++s)
         dst[c*spatial + s] += bias[c];

Algorithm's details (example for NHWC tensor format):

for(s = 0; s < spatial; ++s)
    for(c = 0; c < channels; ++c)
         dst[s*channels + c] += bias[c];

Note: This function is used in Synet Framework.

Parameters

[in]	bias	- a pointer to the 32-bit float array with bias coefficients. The size of the array is equal to channels.
[in]	channels	- a number of channels in the tensor.
[in]	spatial	- a spatial size (height * width) of the tensor.
[in,out]	dst	- a pointer to FP32 tensor updated in place. The size of the array is equal to channels * spatial.
[in]	format	- a format of the tensor.

◆ SimdSynetAdd8i()

void SimdSynetAdd8i	(	const uint8_t *	aData,
		const float *	aScale,
		const float *	aShift,
		const uint8_t *	bData,
		const float *	bScale,
		const float *	bShift,
		uint8_t *	cData,
		const float *	cScale,
		const float *	cShift,
		size_t	batch,
		size_t	channels,
		size_t	spatial,
		SimdTensorFormatType	format,
		SimdSynetCompatibilityType	compatibility
	)

Dequantizes, adds and requantizes two UINT8 tensors.

Algorithm's details (example for NCHW tensor format):

upper = isNarrowed(compatibility) ? 180 : 255;
for(b = 0; b < batch; ++b)
    for(c = 0; c < channels; ++c)
        for(s = 0; s < spatial; ++s)
        {
             offs = (b*channels + c)*spatial + s;
             A = aData[offs]*aScale[c] + aShift[c];
             B = bData[offs]*bScale[c] + bShift[c];
             C = round((A + B)*cScale[c] + cShift[c]);
             cData[offs] = restrict(C, 0, upper);
        }

For NHWC tensor format the same calculation uses offset (b*spatial + s)*channels + c.

Note: This function is used in Synet Framework.

Parameters

[in]	aData	- a pointer to the first input UINT8 tensor.
[in]	aScale	- a pointer to the 32-bit float array with per-channel scale coefficients of the first input tensor.
[in]	aShift	- a pointer to the 32-bit float array with per-channel shift coefficients of the first input tensor.
[in]	bData	- a pointer to the second input UINT8 tensor.
[in]	bScale	- a pointer to the 32-bit float array with per-channel scale coefficients of the second input tensor.
[in]	bShift	- a pointer to the 32-bit float array with per-channel shift coefficients of the second input tensor.
[out]	cData	- a pointer to the output UINT8 tensor.
[in]	cScale	- a pointer to the 32-bit float array with per-channel scale coefficients of the output tensor.
[in]	cShift	- a pointer to the 32-bit float array with per-channel shift coefficients of the output tensor.
[in]	batch	- a batch size of input and output tensors.
[in]	channels	- a number of channels in input and output tensors.
[in]	spatial	- a spatial size (height * width) of input and output tensors.
[in]	format	- a format of input and output tensors. Can be NCHW or NHWC.
[in]	compatibility	- calculation compatibility flags. When narrowed 8-bit mode is active, output is limited to [0, 180], otherwise to [0, 255].

Simd Library Documentation.

Functions

Detailed Description

Function Documentation

◆ SimdSynetAdd16bInit()

◆ SimdSynetAdd16bForward()

◆ SimdSynetAddBias()

◆ SimdSynetAdd8i()