Simd Library Documentation.

Home | Release Notes | Download | Documentation | Issues | GitHub
Other functions

Other accelerated functions used in Synet Framework. More...

Functions

SIMD_API void SimdSynetChannelSum16b (const uint16_t *src, size_t channels, size_t spatial, SimdTensorFormatType format, float *sum)
 Calculates per-channel sums of a BF16 tensor in FP32 format. More...
 
SIMD_API void SimdSynetEltwiseLayerForward (float const *const *src, const float *weight, size_t count, size_t size, SimdSynetEltwiseOperationType type, float *dst)
 Performs element-wise product, weighted sum, maximum or minimum over several FP32 tensors. More...
 
SIMD_API void SimdSynetLrnLayerCrossChannels (const float *src, size_t half, size_t channels, size_t spatial, const float *k, float *dst, SimdTensorFormatType format)
 Performs local response normalization across channels for a single FP32 tensor. More...
 
SIMD_API void SimdSynetShuffleLayerForward (const float *src0, const float *src1, size_t channels0, size_t channels1, size_t spatial, float *dst0, float *dst1, SimdTensorFormatType format, int type)
 Performs forward propagation of FP32 ShuffleLayer. More...
 
SIMD_API void SimdSynetSoftmax32f (const float *src, size_t outer, size_t count, size_t inner, float *dst)
 Calculates FP32 softmax along count dimension. More...
 
SIMD_API void SimdSynetSoftmax16b (const uint16_t *src, size_t outer, size_t count, size_t inner, uint16_t *dst)
 Calculates BF16 softmax along count dimension. More...
 
SIMD_API void SimdSynetTiledScale2D32f (const float *src, size_t channels, size_t height, size_t width, SimdTensorFormatType format, const float *ver, const float *hor, float *dst)
 Multiplies every element of a 3D FP32 tensor by two tiled 2D scale tensors. More...
 
SIMD_API void SimdSynetUnaryOperation32f (const float *src, size_t size, SimdSynetUnaryOperation32fType type, float *dst)
 Applies selected unary operation to every element of a 32-bit floating point array. More...
 

Detailed Description

Other accelerated functions used in Synet Framework.

Function Documentation

◆ SimdSynetChannelSum16b()

void SimdSynetChannelSum16b ( const uint16_t *  src,
size_t  channels,
size_t  spatial,
SimdTensorFormatType  format,
float *  sum 
)

Calculates per-channel sums of a BF16 tensor in FP32 format.

Algorithm's details (example for NCHW tensor format):

for(c = 0; c < channels; ++c)
{
    sum[c] = 0;
    for(s = 0; s < spatial; ++s)
        sum[c] += BFloat16ToFloat32(src[c*spatial + s]);
}

Algorithm's details (example for NHWC tensor format):

for(c = 0; c < channels; ++c)
    sum[c] = 0;
for(s = 0; s < spatial; ++s)
    for(c = 0; c < channels; ++c)
        sum[c] += BFloat16ToFloat32(src[s*channels + c]);
Note
This function is used in Synet Framework.
Parameters
[in]src- a pointer to the input BF16 tensor.
[in]channels- a number of channels in input tensor.
[in]spatial- a spatial (width * height) size of input tensor.
[in]format- a format of input tensor.
[out]sum- a pointer to output 32-bit float array with channels sums.

◆ SimdSynetEltwiseLayerForward()

void SimdSynetEltwiseLayerForward ( float const *const *  src,
const float *  weight,
size_t  count,
size_t  size,
SimdSynetEltwiseOperationType  type,
float *  dst 
)

Performs element-wise product, weighted sum, maximum or minimum over several FP32 tensors.

The function reads count input arrays of equal length size and writes one output array. The weight array is used only for SimdSynetEltwiseOperationSum.

Algorithm's details for SimdSynetEltwiseOperationProduct:

for(j = 0; j < size; ++j)
    dst[j] = src[0][j] * src[1][j];
for(i = 2; i < count; ++i)
    for(j = 0; j < size; ++j)
        dst[j] *= src[i][j];

Algorithm's details for SimdSynetEltwiseOperationSum:

for(j = 0; j < size; ++j)
    dst[j] = src[0][j]*weight[0] + src[1][j]*weight[1];
for(i = 2; i < count; ++i)
    for(j = 0; j < size; ++j)
        dst[j] += src[i][j]*weight[i];

Algorithm's details for SimdSynetEltwiseOperationMax:

for(j = 0; j < size; ++j)
    dst[j] = Max(src[0][j], src[1][j]);
for(i = 2; i < count; ++i)
    for(j = 0; j < size; ++j)
        dst[j] = Max(dst[j], src[i][j]);

Algorithm's details for SimdSynetEltwiseOperationMin:

for(j = 0; j < size; ++j)
    dst[j] = Min(src[0][j], src[1][j]);
for(i = 2; i < count; ++i)
    for(j = 0; j < size; ++j)
        dst[j] = Min(dst[j], src[i][j]);
Note
This function is used in Synet Framework.
Parameters
[in]src- a pointer to count pointers to input FP32 arrays.
[in]weight- a pointer to FP32 weighted-sum coefficients. It is used only for SimdSynetEltwiseOperationSum; otherwise it can be NULL.
[in]count- a count of input arrays. Must be at least 2.
[in]size- a number of elements in each input and output array.
[in]type- a type of operation (see SimdSynetEltwiseOperationType).
[out]dst- a pointer to the output FP32 array.

◆ SimdSynetLrnLayerCrossChannels()

void SimdSynetLrnLayerCrossChannels ( const float *  src,
size_t  half,
size_t  channels,
size_t  spatial,
const float *  k,
float *  dst,
SimdTensorFormatType  format 
)

Performs local response normalization across channels for a single FP32 tensor.

For every tensor element the function accumulates squares of values from a channel window [c - half, c + half] clipped by tensor boundaries, and multiplies the source value by Pow(k[0] + k[1]*sum, k[2]). It supports SimdTensorFormatNchw and SimdTensorFormatNhwc.

Algorithm's details (NCHW tensor format):

for(c = 0; c < channels; ++c)
    for(s = 0; s < spatial; ++s)
    {
        lo = Max(0, c - half);
        hi = Min(channels, c + half + 1);
        sum = 0;
        for(i = lo; i < hi; ++i)
            sum += Square(src[i*spatial + s]);
        dst[c*spatial + s] = src[c*spatial + s]*Pow(k[0] + sum*k[1], k[2]);
    }
Note
This function is used in Synet Framework.
Parameters
[in]src- a pointer to the FP32 input tensor. The size of the array must be equal to channels*spatial.
[in]half- a half size of the normalization channel window.
[in]channels- a number of input and output tensor channels.
[in]spatial- a spatial size (height*width) of input and output tensor.
[in]k- a pointer to three FP32 coefficients: offset, scale and exponent.
[out]dst- a pointer to the FP32 output tensor. The size of the array must be equal to channels*spatial.
[in]format- a format of input and output tensor. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc.

◆ SimdSynetShuffleLayerForward()

void SimdSynetShuffleLayerForward ( const float *  src0,
const float *  src1,
size_t  channels0,
size_t  channels1,
size_t  spatial,
float *  dst0,
float *  dst1,
SimdTensorFormatType  format,
int  type 
)

Performs forward propagation of FP32 ShuffleLayer.

For type 0 the function splits even and odd channels from two input tensors into two output tensors. For type 1 it performs the inverse operation and interleaves channels from two input tensors into two output tensors. The number of channels in each input (type 0) or output (type 1) tensor must be even.

Note
This function is used in Synet Framework.
Parameters
[in]src0- a pointer to the 32-bit float array with the first input image tensor.
[in]src1- a pointer to the 32-bit float array with the second input image tensor.
[in]channels0- a number of channels in the first input (type == 0) or output (type == 1) image tensor. It must be even number.
[in]channels1- a number of channels in the second input (type == 0) or output (type == 1) image tensor. It must be even number.
[in]spatial- a spatial size of (input/output) image tensors.
[out]dst0- a pointer to the 32-bit float array with the first output image tensor.
[out]dst1- a pointer to the 32-bit float array with the second output image tensor.
[in]format- a format of input and output image tensors. It can be SimdTensorFormatNchw or SimdTensorFormatNhwc.
[in]type- a shuffle type: 0 for split operation, 1 for interleave operation.

◆ SimdSynetSoftmax32f()

void SimdSynetSoftmax32f ( const float *  src,
size_t  outer,
size_t  count,
size_t  inner,
float *  dst 
)

Calculates FP32 softmax along count dimension.

Algorithm's details:

for(o = 0; o < outer; ++o)
    for(i = 0; i < inner; ++i)
    {
        max = Max(src[(o*count + c)*inner + i]) over c in [0, count);
        sum = Sum(exp(src[(o*count + c)*inner + i] - max)) over c in [0, count);
        for(c = 0; c < count; ++c)
            dst[(o*count + c)*inner + i] = exp(src[(o*count + c)*inner + i] - max)/sum;
    }
Note
This function is used in Synet Framework.
Parameters
[in]src- a pointer to the input FP32 array. The size of the array must be equal to outer*count*inner.
[in]outer- a product of dimensions before softmax axis.
[in]count- a size of softmax axis.
[in]inner- a product of dimensions after softmax axis.
[out]dst- a pointer to the output FP32 array. The size of the array must be equal to outer*count*inner.

◆ SimdSynetSoftmax16b()

void SimdSynetSoftmax16b ( const uint16_t *  src,
size_t  outer,
size_t  count,
size_t  inner,
uint16_t *  dst 
)

Calculates BF16 softmax along count dimension.

Input BF16 values are converted to FP32 for exponent and sum computations. The final probabilities are converted back to BF16.

Note
This function is used in Synet Framework.
Parameters
[in]src- a pointer to the input BF16 array. The size of the array must be equal to outer*count*inner.
[in]outer- a product of dimensions before softmax axis.
[in]count- a size of softmax axis.
[in]inner- a product of dimensions after softmax axis.
[out]dst- a pointer to the output BF16 array. The size of the array must be equal to outer*count*inner.

◆ SimdSynetTiledScale2D32f()

void SimdSynetTiledScale2D32f ( const float *  src,
size_t  channels,
size_t  height,
size_t  width,
SimdTensorFormatType  format,
const float *  ver,
const float *  hor,
float *  dst 
)

Multiplies every element of a 3D FP32 tensor by two tiled 2D scale tensors.

Algorithm's details for NCHW tensor format:

for(c = 0; c < channels; ++c)
    for(y = 0; y < height; ++y)
        for(x = 0; x < width; ++x)
            dst[(c*height + y)*width + x] = src[(c*height + y)*width + x] * ver[c*width + x] * hor[c*height + y];

Algorithm's details for NHWC tensor format:

for(y = 0; y < height; ++y)
    for(x = 0; x < width; ++x)
        for(c = 0; c < channels; ++c)
            dst[(y*width + x)*channels + c] = src[(y*width + x)*channels + c] * ver[x*channels + c] * hor[y*channels + c];
Note
This function is used in Synet Framework.
Parameters
[in]src- a pointer to the 32-bit float array with input image tensor. The size of the array is equal to channels * height * width.
[in]channels- a number of channels in the (input/output) image tensor.
[in]height- a height of (input/output) image tensor.
[in]width- a width of (input/output) image tensor.
[in]format- a format of (input/output) image tensor. Only SimdTensorFormatNchw and SimdTensorFormatNhwc are supported.
[in]ver- a pointer to the 32-bit float array with scale coefficients indexed by channel and column. The size of the array is equal to channels * width.
[in]hor- a pointer to the 32-bit float array with scale coefficients indexed by channel and row. The size of the array is equal to channels * height.
[out]dst- a pointer to the 32-bit float array with output image tensor. The size of the array is equal to channels * height * width. Input and output image tensors can be the same.

◆ SimdSynetUnaryOperation32f()

void SimdSynetUnaryOperation32f ( const float *  src,
size_t  size,
SimdSynetUnaryOperation32fType  type,
float *  dst 
)

Applies selected unary operation to every element of a 32-bit floating point array.

For every input element this function performs one of operations described by SimdSynetUnaryOperation32fType: absolute value, ceil, cosine, error function, exponent, floor, logarithm, negation, bitwise NOT of floating point representation, reciprocal, round, reciprocal square root, sign, sine, square root, hyperbolic tangent or zeroing.

Note
This function is used in Synet Framework.
Parameters
[in]src- a pointer to the input 32-bit float array.
[in]size- a size of the input and output arrays.
[in]type- a unary operation type (see SimdSynetUnaryOperation32fType).
[out]dst- a pointer to the output 32-bit float array.