Functions to accelerate NormalizeLayer in Synet Framework. More...

Functions
SIMD_API void	SimdSynetNormalizeLayerForward (const float src, size_t batch, size_t channels, size_t spatial, const float scale, const float eps, SimdBool acrossSpatial, SimdTensorFormatType format, float buf, float *dst)
	Performs FP32 L2 normalization with per-channel scale. More...

SIMD_API void	SimdSynetNormalizeLayerForwardV2 (const float src, size_t batch, size_t channels, size_t spatial, const float scale, const float shift, const float eps, SimdTensorFormatType format, float buf, float dst)
	Performs FP32 layer normalization across channels with per-channel scale and shift. More...

SIMD_API void	SimdSynetNormalizeLayerForwardV3 (const float src, size_t batch, size_t channels, size_t spatial, const float scale, const float shift, const float eps, SimdTensorFormatType format, float buf, float dst)
	Performs FP32 layer normalization across spatial positions with per-channel scale and shift. More...

SIMD_API void	SimdSynetNormalizeLayerForwardV4 (const float src, size_t batch, size_t channels, size_t spatial, const float scale, const float shift, const float eps, SimdTensorFormatType format, float buf, float dst)
	Performs FP32 channel-norm based normalization with per-channel scale and shift. More...

SIMD_API void	SimdSynetNormalizeLayerForward16bV2 (const uint16_t src, size_t batch, size_t channels, size_t spatial, const float scale, const float shift, const float eps, SimdTensorFormatType format, float buf, uint16_t dst)
	Performs BF16 layer normalization across channels with per-channel scale and shift. More...

Detailed Description

Functions to accelerate NormalizeLayer in Synet Framework.

Function Documentation

◆ SimdSynetNormalizeLayerForward()

void SimdSynetNormalizeLayerForward	(	const float *	src,
		size_t	batch,
		size_t	channels,
		size_t	spatial,
		const float *	scale,
		const float *	eps,
		SimdBool	acrossSpatial,
		SimdTensorFormatType	format,
		float *	buf,
		float *	dst
	)

Performs FP32 L2 normalization with per-channel scale.

If acrossSpatial is SimdTrue, each batch item is normalized by one norm computed across all channels and spatial positions. Otherwise each spatial position is normalized across channels.

Algorithm's details (NHWC format, acrossSpatial is false):

for(b = 0; b < batch; ++b)
    for(s = 0; s < spatial; ++s)
    {
        sum = 0;
        for(c = 0; c < channels; ++c)
            sum += Square(src[b, s, c]);
        for(c = 0; c < channels; ++c)
            dst[b, s, c] = src[b, s, c] * scale[c] / Sqrt(sum + eps[0]);
    }

Note: This function is used in Synet Framework.

Parameters

[in]	src	- a pointer to the input FP32 tensor.
[in]	batch	- a batch size of input and output tensor.
[in]	channels	- a number of channels in input and output tensor.
[in]	spatial	- a spatial size (height*width) of input and output tensor.
[in]	scale	- an array with per-channel scale parameters. The size of the array is equal to channels.
[in]	eps	- a pointer to epsilon parameter. It is used to prevent division by zero.
[in]	acrossSpatial	- a flag that selects normalization across channels*spatial for each batch item. Otherwise normalization is performed across channels for each spatial position.
[in]	format	- a format of input and output tensor. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc.
[out]	buf	- a pointer to external temporary FP32 buffer used for NCHW non-across-spatial mode. The size of the buffer must be equal to spatial. Can be NULL (it causes usage of internal buffer).
[out]	dst	- a pointer to the output FP32 tensor.

◆ SimdSynetNormalizeLayerForwardV2()

void SimdSynetNormalizeLayerForwardV2	(	const float *	src,
		size_t	batch,
		size_t	channels,
		size_t	spatial,
		const float *	scale,
		const float *	shift,
		const float *	eps,
		SimdTensorFormatType	format,
		float *	buf,
		float *	dst
	)

Performs FP32 layer normalization across channels with per-channel scale and shift.

For every batch item and spatial position the function subtracts mean over channels, divides by standard deviation over channels and then applies per-channel scale and shift.

Algorithm's details (NHWC tensor format):

for(b = 0; b < batch; ++b)
    for(s = 0; s < spatial; ++s)
    {
        sum = 0;
        for(c = 0; c < channels; ++c)
            sum += src[b, s, c];
        mean = sum / channels;
        for(c = 0; c < channels; ++c)
            dst[b, s, c] = src[b, s, c] - mean;

        sqsum = 0;
        for(c = 0; c < channels; ++c)
            sqsum += Square(dst[b, s, c]);
        norm = 1 / Sqrt(sqsum / channels + eps[0]);
        for(c = 0; c < channels; ++c)
            dst[b, s, c] = dst[b, s, c] * norm * scale[c] + shift[c];
    }

Note: This function is used in Synet Framework.

Parameters

[in]	src	- a pointer to the input FP32 tensor.
[in]	batch	- a batch size of input and output tensor.
[in]	channels	- a number of channels in input and output tensor.
[in]	spatial	- a spatial size (height*width) of input and output tensor.
[in]	scale	- an array with per-channel scale parameters. The size of the array is equal to channels.
[in]	shift	- an array with per-channel shift parameters. The size of the array is equal to channels.
[in]	eps	- a pointer to epsilon parameter. It is used to prevent division by zero.
[in]	format	- a format of input and output tensor. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc.
[out]	buf	- a pointer to external temporary FP32 buffer used for NCHW layout. The size of the buffer must be equal to spatial. Can be NULL (it causes usage of internal buffer).
[out]	dst	- a pointer to the output FP32 tensor.

◆ SimdSynetNormalizeLayerForwardV3()

void SimdSynetNormalizeLayerForwardV3	(	const float *	src,
		size_t	batch,
		size_t	channels,
		size_t	spatial,
		const float *	scale,
		const float *	shift,
		const float *	eps,
		SimdTensorFormatType	format,
		float *	buf,
		float *	dst
	)

Performs FP32 layer normalization across spatial positions with per-channel scale and shift.

For every batch item and channel the function subtracts mean over spatial positions, divides by standard deviation over spatial positions and then applies the channel scale and shift.

Algorithm's details (NCHW tensor format):

for(b = 0; b < batch; ++b)
    for(c = 0; c < channels; ++c)
    {
        sum = 0;
        for(s = 0; s < spatial; ++s)
            sum += src[b, c, s];
        mean = sum / spatial;
        for(s = 0; s < spatial; ++s)
            dst[b, c, s] = src[b, c, s] - mean;

        sqsum = 0;
        for(s = 0; s < spatial; ++s)
            sqsum += Square(dst[b, c, s]);
        norm = 1 / Sqrt(sqsum / spatial + eps[0]);
        for(s = 0; s < spatial; ++s)
            dst[b, c, s] = dst[b, c, s] * norm * scale[c] + shift[c];
    }

Note: This function is used in Synet Framework.

Parameters

[in]	src	- a pointer to the input FP32 tensor.
[in]	batch	- a batch size of input and output tensor.
[in]	channels	- a number of channels in input and output tensor.
[in]	spatial	- a spatial size (height*width) of input and output tensor.
[in]	scale	- an array with per-channel scale parameters. The size of the array is equal to channels.
[in]	shift	- an array with per-channel shift parameters. The size of the array is equal to channels.
[in]	eps	- a pointer to epsilon parameter. It is used to prevent division by zero.
[in]	format	- a format of input and output tensor. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc.
[out]	buf	- a pointer to external temporary FP32 buffer used for NHWC layout. The size of the buffer must be equal to channels. Can be NULL (it causes usage of internal buffer).
[out]	dst	- a pointer to the output FP32 tensor.

◆ SimdSynetNormalizeLayerForwardV4()

void SimdSynetNormalizeLayerForwardV4	(	const float *	src,
		size_t	batch,
		size_t	channels,
		size_t	spatial,
		const float *	scale,
		const float *	shift,
		const float *	eps,
		SimdTensorFormatType	format,
		float *	buf,
		float *	dst
	)

Performs FP32 channel-norm based normalization with per-channel scale and shift.

For every batch item the function computes an L2 norm for each channel, normalizes these norms by their channel average and uses the result as a per-channel multiplier.

Algorithm's details (NCHW tensor format):

for(b = 0; b < batch; ++b)
{
    sum = 0;
    for(c = 0; c < channels; ++c)
    {
        sqsum = 0;
        for(s = 0; s < spatial; ++s)
            sqsum += Square(src[b, c, s]);
        buf[c] = sqrt(sqsum);
        sum += buf[c];
    }
    norm = 1 / (sum / channels + eps[0]);
    for(c = 0; c < channels; ++c)
    {
        buf[c] = 1 + scale[c] * buf[c] * norm;
        for(s = 0; s < spatial; ++s)
            dst[b, c, s] = src[b, c, s] * buf[c] + shift[c];
    }
}

Note: This function is used in Synet Framework.

Parameters

[in]	src	- a pointer to the input FP32 tensor.
[in]	batch	- a batch size of input and output tensor.
[in]	channels	- a number of channels in input and output tensor.
[in]	spatial	- a spatial size (height*width) of input and output tensor.
[in]	scale	- an array with per-channel scale parameters. The size of the array is equal to channels.
[in]	shift	- an array with per-channel shift parameters. The size of the array is equal to channels.
[in]	eps	- a pointer to epsilon parameter. It is used to prevent division by zero.
[in]	format	- a format of input and output tensor. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc.
[out]	buf	- a pointer to external temporary FP32 buffer. The size of the buffer must be equal to channels. Can be NULL (it causes usage of internal buffer).
[out]	dst	- a pointer to the output FP32 tensor.

◆ SimdSynetNormalizeLayerForward16bV2()

void SimdSynetNormalizeLayerForward16bV2	(	const uint16_t *	src,
		size_t	batch,
		size_t	channels,
		size_t	spatial,
		const float *	scale,
		const float *	shift,
		const float *	eps,
		SimdTensorFormatType	format,
		float *	buf,
		uint16_t *	dst
	)

Performs BF16 layer normalization across channels with per-channel scale and shift.

This BF16 variant supports only SimdTensorFormatNhwc. Source values are converted to FP32 for mean, variance, scale and shift calculation, and the result is converted back to BF16.

Algorithm's details (NHWC tensor format):

for(b = 0; b < batch; ++b)
    for(s = 0; s < spatial; ++s)
    {
        for(c = 0; c < channels; ++c)
            buf[c] = Bf16ToFp32(src[b, s, c]);

        sum = 0;
        for(c = 0; c < channels; ++c)
            sum += buf[c];
        mean = sum / channels;
        for(c = 0; c < channels; ++c)
            buf[c] = buf[c] - mean;

        sqsum = 0;
        for(c = 0; c < channels; ++c)
            sqsum += Square(buf[c]);
        norm = 1 / Sqrt(sqsum / channels + eps[0]);
        for(c = 0; c < channels; ++c)
            dst[b, s, c] = Fp32ToBf16(buf[c] * norm * scale[c] + shift[c]);
    }

Note: This function is used in Synet Framework.

Parameters

[in]	src	- a pointer to the input BF16 tensor.
[in]	batch	- a batch size of input and output tensor.
[in]	channels	- a number of channels in input and output tensor.
[in]	spatial	- a spatial size (height*width) of input and output tensor.
[in]	scale	- an array with per-channel scale parameters. The size of the array is equal to channels.
[in]	shift	- an array with per-channel shift parameters. The size of the array is equal to channels.
[in]	eps	- a pointer to epsilon parameter. It is used to prevent division by zero.
[in]	format	- a format of input and output tensor. It must be SimdTensorFormatNhwc.
[out]	buf	- a pointer to external temporary FP32 buffer. The size of the buffer must be equal to channels. Can be NULL (it causes usage of internal buffer).
[out]	dst	- a pointer to the output BF16 tensor.

Simd Library Documentation.

Functions

Detailed Description

Function Documentation

◆ SimdSynetNormalizeLayerForward()

◆ SimdSynetNormalizeLayerForwardV2()

◆ SimdSynetNormalizeLayerForwardV3()

◆ SimdSynetNormalizeLayerForwardV4()

◆ SimdSynetNormalizeLayerForward16bV2()