Simd Library Documentation.

Home | Release Notes | Download | Documentation | Issues | GitHub
BF16 deconvolution framework

A framework to accelerate BF16 deconvolution in Synet Framework. More...

Functions

SIMD_API void * SimdSynetDeconvolution16bInit (size_t batch, const SimdConvolutionParameters *conv, SimdSynetCompatibilityType compatibility)
 Initializes a BF16/FP32 deconvolution context. More...
 
SIMD_API size_t SimdSynetDeconvolution16bExternalBufferSize (const void *context)
 Gets the size in bytes of caller-provided temporary buffer for BF16 deconvolution. More...
 
SIMD_API size_t SimdSynetDeconvolution16bInternalBufferSize (const void *context)
 Gets the size in bytes of internal storage used by a BF16 deconvolution context. More...
 
SIMD_API const char * SimdSynetDeconvolution16bInfo (const void *context)
 Gets a short description of the selected BF16 deconvolution implementation. More...
 
SIMD_API void SimdSynetDeconvolution16bSetParams (void *context, const float *weight, const float *bias, const float *params)
 Sets weights, bias and activation parameters for BF16 deconvolution. More...
 
SIMD_API void SimdSynetDeconvolution16bForward (void *context, const uint8_t *src, uint8_t *buf, uint8_t *dst)
 Performs forward propagation of BF16/FP32 deconvolution. More...
 

Detailed Description

A framework to accelerate BF16 deconvolution in Synet Framework.

Function Documentation

◆ SimdSynetDeconvolution16bInit()

void * SimdSynetDeconvolution16bInit ( size_t  batch,
const SimdConvolutionParameters conv,
SimdSynetCompatibilityType  compatibility 
)

Initializes a BF16/FP32 deconvolution context.

The function validates deconvolution parameters and chooses a suitable BF16-oriented implementation (GEMM-based or NHWC GEMM-based variant, including AMX-BF16 when available). It supports FP32 or BF16 source and destination tensors with matching NCHW format, or matching NHWC format when group is 1. The destination spatial size must match deconvolution parameters:

dstH = strideY*(srcH - 1) + dilationY*(kernelY - 1) + 1 - padY - padH
dstW = strideX*(srcW - 1) + dilationX*(kernelX - 1) + 1 - padX - padW

A created context stores tensor shape, data types, format, deconvolution geometry, group count, activation type and compatibility flags. FP32 weights, bias and activation parameters are attached later by SimdSynetDeconvolution16bSetParams.

Parameters
[in]batch- a batch size.
[in]conv- a pointer to deconvolution parameters. Source and destination tensor types must be FP32 or BF16.
[in]compatibility- calculation compatibility flags.
Returns
a pointer to BF16 deconvolution context. On error it returns NULL. It must be released with using of function SimdRelease. This pointer is used in functions SimdSynetDeconvolution16bExternalBufferSize, SimdSynetDeconvolution16bInternalBufferSize, SimdSynetDeconvolution16bInfo, SimdSynetDeconvolution16bSetParams and SimdSynetDeconvolution16bForward.

◆ SimdSynetDeconvolution16bExternalBufferSize()

size_t SimdSynetDeconvolution16bExternalBufferSize ( const void *  context)

Gets the size in bytes of caller-provided temporary buffer for BF16 deconvolution.

The returned value is a number of bytes. It depends on the implementation selected during initialization and can be used to allocate the buf argument of SimdSynetDeconvolution16bForward. Some implementations return 1 or 0 when they do not need external temporary storage.

Parameters
[in]context- a pointer to BF16 deconvolution context. It must be created by function SimdSynetDeconvolution16bInit and released by function SimdRelease.
Returns
a number of bytes required for external temporary buffer.

◆ SimdSynetDeconvolution16bInternalBufferSize()

size_t SimdSynetDeconvolution16bInternalBufferSize ( const void *  context)

Gets the size in bytes of internal storage used by a BF16 deconvolution context.

The returned value reports internal storage tracked by the selected implementation, including internal temporary buffers, transformed weights, copied bias and copied activation parameters.

Parameters
[in]context- a pointer to BF16 deconvolution context. It must be created by function SimdSynetDeconvolution16bInit and released by function SimdRelease.
Returns
a number of bytes used by internal buffers.

◆ SimdSynetDeconvolution16bInfo()

const char * SimdSynetDeconvolution16bInfo ( const void *  context)

Gets a short description of the selected BF16 deconvolution implementation.

The returned string contains the implementation extension and algorithm name, for example a GEMM or NHWC GEMM variant. The returned pointer is owned by the context and remains valid until the next call of this function for the same context or until the context is released.

Parameters
[in]context- a pointer to BF16 deconvolution context. It must be created by function SimdSynetDeconvolution16bInit and released by function SimdRelease.
Returns
a string with description of internal implementation of BF16 deconvolution algorithm.

◆ SimdSynetDeconvolution16bSetParams()

void SimdSynetDeconvolution16bSetParams ( void *  context,
const float *  weight,
const float *  bias,
const float *  params 
)

Sets weights, bias and activation parameters for BF16 deconvolution.

This function must be called before SimdSynetDeconvolution16bForward. The weight array contains FP32 deconvolution weights with kernelY*kernelX*srcC*dstC/group elements. The selected implementation transforms weights to its internal BF16/reordered representation. Bias is copied to an internal FP32 array; when bias is NULL, zeros are used. Activation parameters are copied or expanded to the internal FP32 array according to SimdConvolutionActivationType.

Parameters
[in,out]context- a pointer to BF16 deconvolution context. It must be created by function SimdSynetDeconvolution16bInit and released by function SimdRelease.
[in]weight- a pointer to FP32 deconvolution weights.
[in]bias- a pointer to FP32 bias array with dstC elements. Can be NULL.
[in]params- a pointer to FP32 parameters of activation function (see SimdConvolutionActivationType). Can be NULL when activation does not require parameters.

◆ SimdSynetDeconvolution16bForward()

void SimdSynetDeconvolution16bForward ( void *  context,
const uint8_t *  src,
uint8_t *  buf,
uint8_t *  dst 
)

Performs forward propagation of BF16/FP32 deconvolution.

The function converts FP32 input to BF16 when the context source type is FP32, uses BF16 input directly when the source type is BF16, accumulates transposed convolution sums in FP32, adds bias, applies activation and writes FP32 or BF16 output according to the context destination type:

dst[:] = 0;
for(sc = 0; sc < srcC/group; ++sc)
    for(sy = 0; sy < srcH; ++sy)
        for(sx = 0; sx < srcW; ++sx)
            for(ky = 0; ky < kernelY; ++ky)
                for(kx = 0; kx < kernelX; ++kx)
                    dst[outputOffset] += inputValue * weightValue;
value = Activate(dst[outputOffset] + bias[dc], activation, params);
dst[outputOffset] = dstT == SimdTensorData16b ? Float32ToBFloat16(value) : value;

The input value is read as BF16 or converted from FP32 to BF16 according to srcT. The weight value comes from the internal representation prepared by SimdSynetDeconvolution16bSetParams. The exact offsets depend on tensor format, padding, dilation, stride and group. The input and output tensors use the shape, data types and format from the context created by SimdSynetDeconvolution16bInit.

Parameters
[in]context- a pointer to BF16 deconvolution context. It must be created by function SimdSynetDeconvolution16bInit and released by function SimdRelease.
[in]src- a pointer to input tensor. Actual element type is defined by srcT in deconvolution parameters.
[out]buf- a pointer to external temporary byte buffer. The required size is determined by function SimdSynetDeconvolution16bExternalBufferSize. Can be NULL (it causes usage of internal buffer).
[out]dst- a pointer to output tensor. Actual element type is defined by dstT in deconvolution parameters.