A framework to accelerate BF16 merged convolution in Synet Framework. More...
Functions | |
| SIMD_API void * | SimdSynetMergedConvolution16bInit (size_t batch, const SimdConvolutionParameters *convs, size_t count, SimdBool add) |
| Initializes a merged convolution context that uses BF16 for internal convolution data. More... | |
| SIMD_API size_t | SimdSynetMergedConvolution16bExternalBufferSize (const void *context) |
| Gets the size in bytes of the optional external temporary buffer for BF16 merged convolution. More... | |
| SIMD_API size_t | SimdSynetMergedConvolution16bInternalBufferSize (const void *context) |
| Gets the size in bytes of internal storage used by a BF16 merged convolution context. More... | |
| SIMD_API const char * | SimdSynetMergedConvolution16bInfo (const void *context) |
| Gets a textual description of the selected BF16 merged convolution implementation. More... | |
| SIMD_API void | SimdSynetMergedConvolution16bSetParams (void *context, const float *const *weight, const float *const *bias, const float *const *params) |
| Sets FP32 weights, biases and activation parameters for BF16 merged convolution. More... | |
| SIMD_API void | SimdSynetMergedConvolution16bForward (void *context, const uint8_t *src, uint8_t *buf, uint8_t *dst) |
| Performs forward propagation through the fused BF16 merged convolution sequence. More... | |
Detailed Description
A framework to accelerate BF16 merged convolution in Synet Framework.
Function Documentation
◆ SimdSynetMergedConvolution16bInit()
| void * SimdSynetMergedConvolution16bInit | ( | size_t | batch, |
| const SimdConvolutionParameters * | convs, | ||
| size_t | count, | ||
| SimdBool | add | ||
| ) |
Initializes a merged convolution context that uses BF16 for internal convolution data.
The context fuses a sequence of two or three NHWC convolutions into one forward call: convolution + depthwise convolution, depthwise convolution + convolution, or convolution + depthwise convolution + convolution. Source and destination tensors can be FP32 or BF16 according to the corresponding SimdConvolutionParameters fields. Ordinary convolutions use 1x1 or 3x3 kernels, depthwise convolutions use 3x3, 5x5 or 7x7 kernels; dilation must be 1 and stride must be 1, 2 or 3. If add is SimdTrue for a three-convolution sequence, the source tensor is added to the final output and therefore must have the same shape as the final destination tensor.
- Parameters
-
[in] batch - a batch size. [in] convs - an array with convolution parameters in execution order. [in] count - a number of merged convolutions. It must be 2 or 3. [in] add - a flag that enables adding the source tensor to the final output tensor.
- Returns
- a pointer to BF16 merged convolution context. On error it returns NULL. It must be released with function SimdRelease. This pointer is used in functions SimdSynetMergedConvolution16bExternalBufferSize, SimdSynetMergedConvolution16bInternalBufferSize, SimdSynetMergedConvolution16bInfo, SimdSynetMergedConvolution16bSetParams and SimdSynetMergedConvolution16bForward.
◆ SimdSynetMergedConvolution16bExternalBufferSize()
| size_t SimdSynetMergedConvolution16bExternalBufferSize | ( | const void * | context | ) |
Gets the size in bytes of the optional external temporary buffer for BF16 merged convolution.
- Parameters
-
[in] context - a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease.
- Returns
- size in bytes of the external temporary buffer passed to SimdSynetMergedConvolution16bForward.
◆ SimdSynetMergedConvolution16bInternalBufferSize()
| size_t SimdSynetMergedConvolution16bInternalBufferSize | ( | const void * | context | ) |
Gets the size in bytes of internal storage used by a BF16 merged convolution context.
- Parameters
-
[in] context - a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease.
- Returns
- size in bytes of internal temporary storage, reordered weights, biases and activation parameters.
◆ SimdSynetMergedConvolution16bInfo()
| const char * SimdSynetMergedConvolution16bInfo | ( | const void * | context | ) |
Gets a textual description of the selected BF16 merged convolution implementation.
- Parameters
-
[in] context - a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease.
- Returns
- a zero-terminated string with the selected implementation name.
◆ SimdSynetMergedConvolution16bSetParams()
| void SimdSynetMergedConvolution16bSetParams | ( | void * | context, |
| const float *const * | weight, | ||
| const float *const * | bias, | ||
| const float *const * | params | ||
| ) |
Sets FP32 weights, biases and activation parameters for BF16 merged convolution.
- Parameters
-
[in,out] context - a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease. [in] weight - an array of pointers to FP32 convolution weights. The array size must be equal to the number of merged convolutions. [in] bias - an array of pointers to FP32 bias arrays, one per convolution. Each pointer can be NULL. [in] params - an array of pointers to activation parameters (see SimdConvolutionActivationType), one per convolution. Each pointer can be NULL for activations that do not use parameters.
◆ SimdSynetMergedConvolution16bForward()
| void SimdSynetMergedConvolution16bForward | ( | void * | context, |
| const uint8_t * | src, | ||
| uint8_t * | buf, | ||
| uint8_t * | dst | ||
| ) |
Performs forward propagation through the fused BF16 merged convolution sequence.
- Parameters
-
[in] context - a pointer to BF16 merged convolution context. It must be created by function SimdSynetMergedConvolution16bInit and released by function SimdRelease. [in] src - a pointer to the input tensor bytes. The tensor type is determined by convs[0].srcT (FP32 or BF16). [out] buf - a pointer to an external temporary byte buffer. Its size in bytes is determined by function SimdSynetMergedConvolution16bExternalBufferSize. Can be NULL (it causes usage of internal buffer). [out] dst - a pointer to the output tensor bytes. The tensor type is determined by convs[count - 1].dstT (FP32 or BF16).