A framework to accelerate BF16 convolution in Synet Framework. More...
Functions | |
| SIMD_API void * | SimdSynetConvolution16bInit (size_t batch, const SimdConvolutionParameters *conv, SimdSynetCompatibilityType compatibility) |
| Initializes a BF16/FP32 convolution context. More... | |
| SIMD_API size_t | SimdSynetConvolution16bExternalBufferSize (const void *context) |
| Gets the size in bytes of caller-provided temporary buffer for BF16 convolution. More... | |
| SIMD_API size_t | SimdSynetConvolution16bInternalBufferSize (const void *context) |
| Gets the size in bytes of internal storage used by a BF16 convolution context. More... | |
| SIMD_API const char * | SimdSynetConvolution16bInfo (const void *context) |
| Gets a short description of the selected BF16 convolution implementation. More... | |
| SIMD_API void | SimdSynetConvolution16bSetParams (void *context, const float *weight, const float *bias, const float *params) |
| Sets weights, bias and activation parameters for BF16 convolution. More... | |
| SIMD_API void | SimdSynetConvolution16bForward (void *context, const uint8_t *src, uint8_t *buf, uint8_t *dst) |
| Performs forward propagation of BF16/FP32 convolution. More... | |
Detailed Description
A framework to accelerate BF16 convolution in Synet Framework.
Function Documentation
◆ SimdSynetConvolution16bInit()
| void * SimdSynetConvolution16bInit | ( | size_t | batch, |
| const SimdConvolutionParameters * | conv, | ||
| SimdSynetCompatibilityType | compatibility | ||
| ) |
Initializes a BF16/FP32 convolution context.
The function validates convolution parameters and chooses a suitable BF16-oriented implementation (GEMM, NCHW/NHWC GEMM, NHWC depthwise, NHWC special convolution or AMX-BF16 variant when available). It supports FP32 or BF16 source and destination tensors with matching NCHW or NHWC format. The destination spatial size must match convolution parameters:
dstH = (srcH + padY + padH - (dilationY*(kernelY - 1) + 1)) / strideY + 1 dstW = (srcW + padX + padW - (dilationX*(kernelX - 1) + 1)) / strideX + 1
A created context stores tensor shape, data types, format, convolution geometry, group count, activation type and compatibility flags. FP32 weights, bias and activation parameters are attached later by SimdSynetConvolution16bSetParams.
- Parameters
-
[in] batch - a batch size. [in] conv - a pointer to convolution parameters. Source and destination tensor types must be FP32 or BF16. [in] compatibility - calculation compatibility flags.
- Returns
- a pointer to BF16 convolution context. On error it returns NULL. It must be released with using of function SimdRelease. This pointer is used in functions SimdSynetConvolution16bExternalBufferSize, SimdSynetConvolution16bInternalBufferSize, SimdSynetConvolution16bInfo, SimdSynetConvolution16bSetParams and SimdSynetConvolution16bForward.
◆ SimdSynetConvolution16bExternalBufferSize()
| size_t SimdSynetConvolution16bExternalBufferSize | ( | const void * | context | ) |
Gets the size in bytes of caller-provided temporary buffer for BF16 convolution.
The returned value is a number of bytes. It depends on the implementation selected during initialization and can be used to allocate the buf argument of SimdSynetConvolution16bForward. Some implementations return 1 or 0 when they do not need external temporary storage.
- Parameters
-
[in] context - a pointer to BF16 convolution context. It must be created by function SimdSynetConvolution16bInit and released by function SimdRelease.
- Returns
- a number of bytes required for external temporary buffer.
◆ SimdSynetConvolution16bInternalBufferSize()
| size_t SimdSynetConvolution16bInternalBufferSize | ( | const void * | context | ) |
Gets the size in bytes of internal storage used by a BF16 convolution context.
The returned value reports internal storage tracked by the selected implementation, including internal temporary buffers, transformed weights, copied bias and copied activation parameters.
- Parameters
-
[in] context - a pointer to BF16 convolution context. It must be created by function SimdSynetConvolution16bInit and released by function SimdRelease.
- Returns
- a number of bytes used by internal buffers.
◆ SimdSynetConvolution16bInfo()
| const char * SimdSynetConvolution16bInfo | ( | const void * | context | ) |
Gets a short description of the selected BF16 convolution implementation.
The returned string contains the implementation extension and algorithm name, for example a GEMM, NCHW/NHWC GEMM, NHWC depthwise, NHWC special or AMX-BF16 variant. The returned pointer is owned by the context and remains valid until the next call of this function for the same context or until the context is released.
- Parameters
-
[in] context - a pointer to BF16 convolution context. It must be created by function SimdSynetConvolution16bInit and released by function SimdRelease.
- Returns
- a string with description of internal implementation of BF16 convolution algorithm.
◆ SimdSynetConvolution16bSetParams()
| void SimdSynetConvolution16bSetParams | ( | void * | context, |
| const float * | weight, | ||
| const float * | bias, | ||
| const float * | params | ||
| ) |
Sets weights, bias and activation parameters for BF16 convolution.
This function must be called before SimdSynetConvolution16bForward. The weight array contains FP32 convolution weights with kernelY*kernelX*srcC*dstC/group elements. The selected implementation transforms weights to its internal representation (usually BF16 and reordered; some depthwise paths keep FP32 weights). Bias is copied to an internal FP32 array; when bias is NULL, zeros are used. Activation parameters are copied or expanded to the internal FP32 array according to SimdConvolutionActivationType.
- Parameters
-
[in,out] context - a pointer to BF16 convolution context. It must be created by function SimdSynetConvolution16bInit and released by function SimdRelease. [in] weight - a pointer to FP32 convolution weights. [in] bias - a pointer to FP32 bias array with dstC elements. Can be NULL. [in] params - a pointer to FP32 parameters of activation function (see SimdConvolutionActivationType). Can be NULL when activation does not require parameters.
◆ SimdSynetConvolution16bForward()
| void SimdSynetConvolution16bForward | ( | void * | context, |
| const uint8_t * | src, | ||
| uint8_t * | buf, | ||
| uint8_t * | dst | ||
| ) |
Performs forward propagation of BF16/FP32 convolution.
The function converts FP32 input to BF16 when the context source type is FP32, uses BF16 input directly when the source type is BF16, accumulates convolution sums in FP32, adds bias, applies activation and writes FP32 or BF16 output according to the context destination type:
sum = bias[dc];
for(sc = 0; sc < srcC/group; ++sc)
for(ky = 0; ky < kernelY; ++ky)
for(kx = 0; kx < kernelX; ++kx)
sum += inputValue * weightValue;
value = Activate(sum, activation, params);
dst[outputOffset] = dstT == SimdTensorData16b ? Float32ToBFloat16(value) : value;
The input value is read as BF16 or converted from FP32 to BF16 according to srcT. The weight value comes from the internal representation prepared by SimdSynetConvolution16bSetParams. The exact offsets depend on tensor format, padding, dilation, stride and group. The input and output tensors use the shape, data types and format from the context created by SimdSynetConvolution16bInit.
- Parameters
-
[in] context - a pointer to BF16 convolution context. It must be created by function SimdSynetConvolution16bInit and released by function SimdRelease. [in] src - a pointer to input tensor. Actual element type is defined by srcT in convolution parameters. [out] buf - a pointer to external temporary byte buffer. The required size is determined by function SimdSynetConvolution16bExternalBufferSize. Can be NULL (it causes usage of internal buffer). [out] dst - a pointer to output tensor. Actual element type is defined by dstT in convolution parameters.