Simd Library Documentation.

Home | Release Notes | Download | Documentation | Issues | GitHub
BF16 merged convolution framework

A framework to accelerate BF16 merged convolution in Synet Framework. More...

Functions

SIMD_API void * SimdSynetMergedConvolution16bInit (size_t batch, const SimdConvolutionParameters *convs, size_t count, SimdBool add)
 Initializes a merged convolution context that uses BF16 for internal convolution data. More...
 
SIMD_API size_t SimdSynetMergedConvolution16bExternalBufferSize (const void *context)
 Gets the size in bytes of the optional external temporary buffer for BF16 merged convolution. More...
 
SIMD_API size_t SimdSynetMergedConvolution16bInternalBufferSize (const void *context)
 Gets the size in bytes of internal storage used by a BF16 merged convolution context. More...
 
SIMD_API const char * SimdSynetMergedConvolution16bInfo (const void *context)
 Gets a textual description of the selected BF16 merged convolution implementation. More...
 
SIMD_API void SimdSynetMergedConvolution16bSetParams (void *context, const float *const *weight, const float *const *bias, const float *const *params)
 Sets FP32 weights, biases and activation parameters for BF16 merged convolution. More...
 
SIMD_API void SimdSynetMergedConvolution16bForward (void *context, const uint8_t *src, uint8_t *buf, uint8_t *dst)
 Performs forward propagation through the fused BF16 merged convolution sequence. More...
 

Detailed Description

A framework to accelerate BF16 merged convolution in Synet Framework.

Function Documentation

◆ SimdSynetMergedConvolution16bInit()

void * SimdSynetMergedConvolution16bInit ( size_t  batch,
const SimdConvolutionParameters convs,
size_t  count,
SimdBool  add 
)

Initializes a merged convolution context that uses BF16 for internal convolution data.

The context fuses a sequence of two or three NHWC convolutions into one forward call: convolution + depthwise convolution, depthwise convolution + convolution, or convolution + depthwise convolution + convolution. Source and destination tensors can be FP32 or BF16 according to the corresponding SimdConvolutionParameters fields. Ordinary convolutions use 1x1 or 3x3 kernels, depthwise convolutions use 3x3, 5x5 or 7x7 kernels; dilation must be 1 and stride must be 1, 2 or 3. If add is SimdTrue for a three-convolution sequence, the source tensor is added to the final output and therefore must have the same shape as the final destination tensor.

Parameters
[in]batch- a batch size.
[in]convs- an array with convolution parameters in execution order.
[in]count- a number of merged convolutions. It must be 2 or 3.
[in]add- a flag that enables adding the source tensor to the final output tensor.
Returns
a pointer to BF16 merged convolution context. On error it returns NULL. It must be released with function SimdRelease. This pointer is used in functions SimdSynetMergedConvolution16bExternalBufferSize, SimdSynetMergedConvolution16bInternalBufferSize, SimdSynetMergedConvolution16bInfo, SimdSynetMergedConvolution16bSetParams and SimdSynetMergedConvolution16bForward.

◆ SimdSynetMergedConvolution16bExternalBufferSize()

size_t SimdSynetMergedConvolution16bExternalBufferSize ( const void *  context)

Gets the size in bytes of the optional external temporary buffer for BF16 merged convolution.

Parameters
[in]context- a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease.
Returns
size in bytes of the external temporary buffer passed to SimdSynetMergedConvolution16bForward.

◆ SimdSynetMergedConvolution16bInternalBufferSize()

size_t SimdSynetMergedConvolution16bInternalBufferSize ( const void *  context)

Gets the size in bytes of internal storage used by a BF16 merged convolution context.

Parameters
[in]context- a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease.
Returns
size in bytes of internal temporary storage, reordered weights, biases and activation parameters.

◆ SimdSynetMergedConvolution16bInfo()

const char * SimdSynetMergedConvolution16bInfo ( const void *  context)

Gets a textual description of the selected BF16 merged convolution implementation.

Parameters
[in]context- a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease.
Returns
a zero-terminated string with the selected implementation name.

◆ SimdSynetMergedConvolution16bSetParams()

void SimdSynetMergedConvolution16bSetParams ( void *  context,
const float *const *  weight,
const float *const *  bias,
const float *const *  params 
)

Sets FP32 weights, biases and activation parameters for BF16 merged convolution.

Parameters
[in,out]context- a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease.
[in]weight- an array of pointers to FP32 convolution weights. The array size must be equal to the number of merged convolutions.
[in]bias- an array of pointers to FP32 bias arrays, one per convolution. Each pointer can be NULL.
[in]params- an array of pointers to activation parameters (see SimdConvolutionActivationType), one per convolution. Each pointer can be NULL for activations that do not use parameters.

◆ SimdSynetMergedConvolution16bForward()

void SimdSynetMergedConvolution16bForward ( void *  context,
const uint8_t *  src,
uint8_t *  buf,
uint8_t *  dst 
)

Performs forward propagation through the fused BF16 merged convolution sequence.

Parameters
[in]context- a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease.
[in]src- a pointer to the input tensor bytes. The tensor type is determined by convs[0].srcT (FP32 or BF16).
[out]buf- a pointer to an external temporary byte buffer. Its size in bytes is determined by function SimdSynetMergedConvolution16bExternalBufferSize. Can be NULL (it causes usage of internal buffer).
[out]dst- a pointer to the output tensor bytes. The tensor type is determined by convs[count - 1].dstT (FP32 or BF16).