Functions to accelerate NormalizeLayer in Synet Framework. More...
Functions | |
| SIMD_API void | SimdSynetNormalizeLayerForward (const float *src, size_t batch, size_t channels, size_t spatial, const float *scale, const float *eps, SimdBool acrossSpatial, SimdTensorFormatType format, float *buf, float *dst) |
| Performs FP32 L2 normalization with per-channel scale. More... | |
| SIMD_API void | SimdSynetNormalizeLayerForwardV2 (const float *src, size_t batch, size_t channels, size_t spatial, const float *scale, const float *shift, const float *eps, SimdTensorFormatType format, float *buf, float *dst) |
| Performs FP32 layer normalization across channels with per-channel scale and shift. More... | |
| SIMD_API void | SimdSynetNormalizeLayerForwardV3 (const float *src, size_t batch, size_t channels, size_t spatial, const float *scale, const float *shift, const float *eps, SimdTensorFormatType format, float *buf, float *dst) |
| Performs FP32 layer normalization across spatial positions with per-channel scale and shift. More... | |
| SIMD_API void | SimdSynetNormalizeLayerForwardV4 (const float *src, size_t batch, size_t channels, size_t spatial, const float *scale, const float *shift, const float *eps, SimdTensorFormatType format, float *buf, float *dst) |
| Performs FP32 channel-norm based normalization with per-channel scale and shift. More... | |
| SIMD_API void | SimdSynetNormalizeLayerForward16bV2 (const uint16_t *src, size_t batch, size_t channels, size_t spatial, const float *scale, const float *shift, const float *eps, SimdTensorFormatType format, float *buf, uint16_t *dst) |
| Performs BF16 layer normalization across channels with per-channel scale and shift. More... | |
Detailed Description
Functions to accelerate NormalizeLayer in Synet Framework.
Function Documentation
◆ SimdSynetNormalizeLayerForward()
| void SimdSynetNormalizeLayerForward | ( | const float * | src, |
| size_t | batch, | ||
| size_t | channels, | ||
| size_t | spatial, | ||
| const float * | scale, | ||
| const float * | eps, | ||
| SimdBool | acrossSpatial, | ||
| SimdTensorFormatType | format, | ||
| float * | buf, | ||
| float * | dst | ||
| ) |
Performs FP32 L2 normalization with per-channel scale.
If acrossSpatial is SimdTrue, each batch item is normalized by one norm computed across all channels and spatial positions. Otherwise each spatial position is normalized across channels.
Algorithm's details (NHWC format, acrossSpatial is false):
for(b = 0; b < batch; ++b)
for(s = 0; s < spatial; ++s)
{
sum = 0;
for(c = 0; c < channels; ++c)
sum += Square(src[b, s, c]);
for(c = 0; c < channels; ++c)
dst[b, s, c] = src[b, s, c] * scale[c] / Sqrt(sum + eps[0]);
}
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input FP32 tensor. [in] batch - a batch size of input and output tensor. [in] channels - a number of channels in input and output tensor. [in] spatial - a spatial size (height*width) of input and output tensor. [in] scale - an array with per-channel scale parameters. The size of the array is equal to channels. [in] eps - a pointer to epsilon parameter. It is used to prevent division by zero. [in] acrossSpatial - a flag that selects normalization across channels*spatial for each batch item. Otherwise normalization is performed across channels for each spatial position. [in] format - a format of input and output tensor. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc. [out] buf - a pointer to external temporary FP32 buffer used for NCHW non-across-spatial mode. The size of the buffer must be equal to spatial. Can be NULL (it causes usage of internal buffer). [out] dst - a pointer to the output FP32 tensor.
◆ SimdSynetNormalizeLayerForwardV2()
| void SimdSynetNormalizeLayerForwardV2 | ( | const float * | src, |
| size_t | batch, | ||
| size_t | channels, | ||
| size_t | spatial, | ||
| const float * | scale, | ||
| const float * | shift, | ||
| const float * | eps, | ||
| SimdTensorFormatType | format, | ||
| float * | buf, | ||
| float * | dst | ||
| ) |
Performs FP32 layer normalization across channels with per-channel scale and shift.
For every batch item and spatial position the function subtracts mean over channels, divides by standard deviation over channels and then applies per-channel scale and shift.
Algorithm's details (NHWC tensor format):
for(b = 0; b < batch; ++b)
for(s = 0; s < spatial; ++s)
{
sum = 0;
for(c = 0; c < channels; ++c)
sum += src[b, s, c];
mean = sum / channels;
for(c = 0; c < channels; ++c)
dst[b, s, c] = src[b, s, c] - mean;
sqsum = 0;
for(c = 0; c < channels; ++c)
sqsum += Square(dst[b, s, c]);
norm = 1 / Sqrt(sqsum / channels + eps[0]);
for(c = 0; c < channels; ++c)
dst[b, s, c] = dst[b, s, c] * norm * scale[c] + shift[c];
}
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input FP32 tensor. [in] batch - a batch size of input and output tensor. [in] channels - a number of channels in input and output tensor. [in] spatial - a spatial size (height*width) of input and output tensor. [in] scale - an array with per-channel scale parameters. The size of the array is equal to channels. [in] shift - an array with per-channel shift parameters. The size of the array is equal to channels. [in] eps - a pointer to epsilon parameter. It is used to prevent division by zero. [in] format - a format of input and output tensor. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc. [out] buf - a pointer to external temporary FP32 buffer used for NCHW layout. The size of the buffer must be equal to spatial. Can be NULL (it causes usage of internal buffer). [out] dst - a pointer to the output FP32 tensor.
◆ SimdSynetNormalizeLayerForwardV3()
| void SimdSynetNormalizeLayerForwardV3 | ( | const float * | src, |
| size_t | batch, | ||
| size_t | channels, | ||
| size_t | spatial, | ||
| const float * | scale, | ||
| const float * | shift, | ||
| const float * | eps, | ||
| SimdTensorFormatType | format, | ||
| float * | buf, | ||
| float * | dst | ||
| ) |
Performs FP32 layer normalization across spatial positions with per-channel scale and shift.
For every batch item and channel the function subtracts mean over spatial positions, divides by standard deviation over spatial positions and then applies the channel scale and shift.
Algorithm's details (NCHW tensor format):
for(b = 0; b < batch; ++b)
for(c = 0; c < channels; ++c)
{
sum = 0;
for(s = 0; s < spatial; ++s)
sum += src[b, c, s];
mean = sum / spatial;
for(s = 0; s < spatial; ++s)
dst[b, c, s] = src[b, c, s] - mean;
sqsum = 0;
for(s = 0; s < spatial; ++s)
sqsum += Square(dst[b, c, s]);
norm = 1 / Sqrt(sqsum / spatial + eps[0]);
for(s = 0; s < spatial; ++s)
dst[b, c, s] = dst[b, c, s] * norm * scale[c] + shift[c];
}
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input FP32 tensor. [in] batch - a batch size of input and output tensor. [in] channels - a number of channels in input and output tensor. [in] spatial - a spatial size (height*width) of input and output tensor. [in] scale - an array with per-channel scale parameters. The size of the array is equal to channels. [in] shift - an array with per-channel shift parameters. The size of the array is equal to channels. [in] eps - a pointer to epsilon parameter. It is used to prevent division by zero. [in] format - a format of input and output tensor. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc. [out] buf - a pointer to external temporary FP32 buffer used for NHWC layout. The size of the buffer must be equal to channels. Can be NULL (it causes usage of internal buffer). [out] dst - a pointer to the output FP32 tensor.
◆ SimdSynetNormalizeLayerForwardV4()
| void SimdSynetNormalizeLayerForwardV4 | ( | const float * | src, |
| size_t | batch, | ||
| size_t | channels, | ||
| size_t | spatial, | ||
| const float * | scale, | ||
| const float * | shift, | ||
| const float * | eps, | ||
| SimdTensorFormatType | format, | ||
| float * | buf, | ||
| float * | dst | ||
| ) |
Performs FP32 channel-norm based normalization with per-channel scale and shift.
For every batch item the function computes an L2 norm for each channel, normalizes these norms by their channel average and uses the result as a per-channel multiplier.
Algorithm's details (NCHW tensor format):
for(b = 0; b < batch; ++b)
{
sum = 0;
for(c = 0; c < channels; ++c)
{
sqsum = 0;
for(s = 0; s < spatial; ++s)
sqsum += Square(src[b, c, s]);
buf[c] = sqrt(sqsum);
sum += buf[c];
}
norm = 1 / (sum / channels + eps[0]);
for(c = 0; c < channels; ++c)
{
buf[c] = 1 + scale[c] * buf[c] * norm;
for(s = 0; s < spatial; ++s)
dst[b, c, s] = src[b, c, s] * buf[c] + shift[c];
}
}
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input FP32 tensor. [in] batch - a batch size of input and output tensor. [in] channels - a number of channels in input and output tensor. [in] spatial - a spatial size (height*width) of input and output tensor. [in] scale - an array with per-channel scale parameters. The size of the array is equal to channels. [in] shift - an array with per-channel shift parameters. The size of the array is equal to channels. [in] eps - a pointer to epsilon parameter. It is used to prevent division by zero. [in] format - a format of input and output tensor. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc. [out] buf - a pointer to external temporary FP32 buffer. The size of the buffer must be equal to channels. Can be NULL (it causes usage of internal buffer). [out] dst - a pointer to the output FP32 tensor.
◆ SimdSynetNormalizeLayerForward16bV2()
| void SimdSynetNormalizeLayerForward16bV2 | ( | const uint16_t * | src, |
| size_t | batch, | ||
| size_t | channels, | ||
| size_t | spatial, | ||
| const float * | scale, | ||
| const float * | shift, | ||
| const float * | eps, | ||
| SimdTensorFormatType | format, | ||
| float * | buf, | ||
| uint16_t * | dst | ||
| ) |
Performs BF16 layer normalization across channels with per-channel scale and shift.
This BF16 variant supports only SimdTensorFormatNhwc. Source values are converted to FP32 for mean, variance, scale and shift calculation, and the result is converted back to BF16.
Algorithm's details (NHWC tensor format):
for(b = 0; b < batch; ++b)
for(s = 0; s < spatial; ++s)
{
for(c = 0; c < channels; ++c)
buf[c] = Bf16ToFp32(src[b, s, c]);
sum = 0;
for(c = 0; c < channels; ++c)
sum += buf[c];
mean = sum / channels;
for(c = 0; c < channels; ++c)
buf[c] = buf[c] - mean;
sqsum = 0;
for(c = 0; c < channels; ++c)
sqsum += Square(buf[c]);
norm = 1 / Sqrt(sqsum / channels + eps[0]);
for(c = 0; c < channels; ++c)
dst[b, s, c] = Fp32ToBf16(buf[c] * norm * scale[c] + shift[c]);
}
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input BF16 tensor. [in] batch - a batch size of input and output tensor. [in] channels - a number of channels in input and output tensor. [in] spatial - a spatial size (height*width) of input and output tensor. [in] scale - an array with per-channel scale parameters. The size of the array is equal to channels. [in] shift - an array with per-channel shift parameters. The size of the array is equal to channels. [in] eps - a pointer to epsilon parameter. It is used to prevent division by zero. [in] format - a format of input and output tensor. It must be SimdTensorFormatNhwc. [out] buf - a pointer to external temporary FP32 buffer. The size of the buffer must be equal to channels. Can be NULL (it causes usage of internal buffer). [out] dst - a pointer to the output BF16 tensor.