Functions to accelerate Winograd convolution algorithm in Synet Framework. More...
Functions | |
| SIMD_API void | SimdWinogradKernel1x3Block1x4SetFilter (const float *src, size_t size, float *dst, SimdBool trans) |
| Converts 1x3 convolution filters to Winograd F(1x4,1x3) domain (or 3x1 filters to F(4x1,3x1) domain). More... | |
| SIMD_API void | SimdWinogradKernel1x3Block1x4SetInput (const float *src, size_t srcChannels, size_t srcHeight, size_t srcWidth, size_t padY, size_t padX, size_t padH, size_t padW, float *dst, size_t dstStride, SimdBool trans) |
| Converts input tensor rows to Winograd F(1x4,1x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel1x3Block1x4SetOutput (const float *src, size_t srcStride, float *dst, size_t dstChannels, size_t dstHeight, size_t dstWidth, SimdBool trans) |
| Converts output tensor rows from Winograd F(1x4,1x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel1x5Block1x4SetFilter (const float *src, size_t size, float *dst, SimdBool trans) |
| Converts 1x5 convolution filters to Winograd F(1x4,1x5) domain (or 5x1 filters to F(4x1,5x1) domain). More... | |
| SIMD_API void | SimdWinogradKernel1x5Block1x4SetInput (const float *src, size_t srcChannels, size_t srcHeight, size_t srcWidth, size_t padY, size_t padX, size_t padH, size_t padW, float *dst, size_t dstStride, SimdBool trans) |
| Converts input tensor rows to Winograd F(1x4,1x5) domain. More... | |
| SIMD_API void | SimdWinogradKernel1x5Block1x4SetOutput (const float *src, size_t srcStride, float *dst, size_t dstChannels, size_t dstHeight, size_t dstWidth, SimdBool trans) |
| Converts output tensor rows from Winograd F(1x4,1x5) domain. More... | |
| SIMD_API void | SimdWinogradKernel2x2Block2x2SetFilter (const float *src, size_t size, float *dst, SimdBool trans) |
| Converts 2x2 convolution filters to Winograd F(2x2,2x2) domain. More... | |
| SIMD_API void | SimdWinogradKernel2x2Block2x2SetInput (const float *src, size_t srcChannels, size_t srcHeight, size_t srcWidth, size_t padY, size_t padX, size_t padH, size_t padW, float *dst, size_t dstStride, SimdBool trans) |
| Converts input tensor tiles to Winograd F(2x2,2x2) domain. More... | |
| SIMD_API void | SimdWinogradKernel2x2Block2x2SetOutput (const float *src, size_t srcStride, float *dst, size_t dstChannels, size_t dstHeight, size_t dstWidth, SimdBool trans) |
| Converts output tensor tiles from Winograd F(2x2,2x2) domain. More... | |
| SIMD_API void | SimdWinogradKernel2x2Block4x4SetFilter (const float *src, size_t size, float *dst, SimdBool trans) |
| Converts 2x2 convolution filters to Winograd F(4x4,2x2) domain. More... | |
| SIMD_API void | SimdWinogradKernel2x2Block4x4SetInput (const float *src, size_t srcChannels, size_t srcHeight, size_t srcWidth, size_t padY, size_t padX, size_t padH, size_t padW, float *dst, size_t dstStride, SimdBool trans) |
| Converts input tensor tiles to Winograd F(4x4,2x2) domain. More... | |
| SIMD_API void | SimdWinogradKernel2x2Block4x4SetOutput (const float *src, size_t srcStride, float *dst, size_t dstChannels, size_t dstHeight, size_t dstWidth, SimdBool trans) |
| Converts output tensor tiles from Winograd F(4x4,2x2) domain. More... | |
| SIMD_API void | SimdWinogradKernel3x3Block2x2SetFilter (const float *src, size_t size, float *dst, SimdBool trans) |
| Converts 3x3 convolution filters to Winograd F(2x2,3x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel3x3Block2x2SetInput (const float *src, size_t srcChannels, size_t srcHeight, size_t srcWidth, size_t padY, size_t padX, size_t padH, size_t padW, float *dst, size_t dstStride, SimdBool trans) |
| Converts input tensor tiles to Winograd F(2x2,3x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel3x3Block2x2SetOutput (const float *src, size_t srcStride, float *dst, size_t dstChannels, size_t dstHeight, size_t dstWidth, SimdBool trans) |
| Converts output tensor tiles from Winograd F(2x2,3x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel3x3Block3x3SetFilter (const float *src, size_t size, float *dst, SimdBool trans) |
| Converts 3x3 convolution filters to Winograd F(3x3,3x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel3x3Block3x3SetInput (const float *src, size_t srcChannels, size_t srcHeight, size_t srcWidth, size_t padY, size_t padX, size_t padH, size_t padW, float *dst, size_t dstStride, SimdBool trans) |
| Converts input tensor tiles to Winograd F(3x3,3x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel3x3Block3x3SetOutput (const float *src, size_t srcStride, float *dst, size_t dstChannels, size_t dstHeight, size_t dstWidth, SimdBool trans) |
| Converts output tensor tiles from Winograd F(3x3,3x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel3x3Block4x4SetFilter (const float *src, size_t size, float *dst, SimdBool trans) |
| Converts 3x3 convolution filters to Winograd F(4x4,3x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel3x3Block4x4SetInput (const float *src, size_t srcChannels, size_t srcHeight, size_t srcWidth, size_t padY, size_t padX, size_t padH, size_t padW, float *dst, size_t dstStride, SimdBool trans) |
| Converts input tensor tiles to Winograd F(4x4,3x3) domain. More... | |
| SIMD_API void | SimdWinogradKernel3x3Block4x4SetOutput (const float *src, size_t srcStride, float *dst, size_t dstChannels, size_t dstHeight, size_t dstWidth, SimdBool trans) |
| Converts output tensor tiles from Winograd F(4x4,3x3) domain. More... | |
Detailed Description
Functions to accelerate Winograd convolution algorithm in Synet Framework.
Function Documentation
◆ SimdWinogradKernel1x3Block1x4SetFilter()
| void SimdWinogradKernel1x3Block1x4SetFilter | ( | const float * | src, |
| size_t | size, | ||
| float * | dst, | ||
| SimdBool | trans | ||
| ) |
Converts 1x3 convolution filters to Winograd F(1x4,1x3) domain (or 3x1 filters to F(4x1,3x1) domain).
For every input-output channel pair the function transforms 3 source filter values to 6 Winograd coefficients. If trans is SimdFalse, every source filter is stored contiguously and output coefficients are separated by size. If trans is SimdTrue, every coefficient plane is stored with stride equal to size.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input 32-bit float array with filter weights. [in] size - (number of input channels)*(number of output channels). [out] dst - a pointer to the output 32-bit float array with 6 transformed coefficients per channel pair. [in] trans - a flag of transposed filter layout.
◆ SimdWinogradKernel1x3Block1x4SetInput()
| void SimdWinogradKernel1x3Block1x4SetInput | ( | const float * | src, |
| size_t | srcChannels, | ||
| size_t | srcHeight, | ||
| size_t | srcWidth, | ||
| size_t | padY, | ||
| size_t | padX, | ||
| size_t | padH, | ||
| size_t | padW, | ||
| float * | dst, | ||
| size_t | dstStride, | ||
| SimdBool | trans | ||
| ) |
Converts input tensor rows to Winograd F(1x4,1x3) domain.
The function transforms source rows by horizontal tiles. Every full tile consumes 6 input values and prepares data for 4 output values. Only horizontal zero padding is supported: padY and padH must be 0, padX must be equal to padW, and padX must be 0 or 1. The transformed output width is srcWidth - 2 without padding, or srcWidth with padding. If trans is SimdFalse, source tensor layout is CHW. If trans is SimdTrue, source tensor layout is HWC and transformed channel values are packed contiguously for every tile.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input image. [in] srcChannels - a number of input channels. [in] srcHeight - a height of input image. [in] srcWidth - a width of input image. [in] padY - an additional zero padding of input image at the beginning of Y-axis. [in] padX - an additional zero padding of input image at the beginning of X-axis. [in] padH - an additional zero padding of input image at the end of Y-axis. [in] padW - an additional zero padding of input image at the end of X-axis. [out] dst - a pointer to the output array with converted image tiles. [in] dstStride - a distance between adjacent Winograd coefficients in output array. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel1x3Block1x4SetOutput()
| void SimdWinogradKernel1x3Block1x4SetOutput | ( | const float * | src, |
| size_t | srcStride, | ||
| float * | dst, | ||
| size_t | dstChannels, | ||
| size_t | dstHeight, | ||
| size_t | dstWidth, | ||
| SimdBool | trans | ||
| ) |
Converts output tensor rows from Winograd F(1x4,1x3) domain.
Every tile contains 6 Winograd coefficients and produces up to 4 output values. Tail tiles are clipped to dstWidth. If trans is SimdFalse, destination tensor layout is CHW. If trans is SimdTrue, destination tensor layout is HWC.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input array with converted image tiles. [in] srcStride - a distance between adjacent Winograd coefficients in input array. [out] dst - a pointer to the output image. [in] dstChannels - a number of output channels. [in] dstHeight - a height of output image. [in] dstWidth - a width of output image. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel1x5Block1x4SetFilter()
| void SimdWinogradKernel1x5Block1x4SetFilter | ( | const float * | src, |
| size_t | size, | ||
| float * | dst, | ||
| SimdBool | trans | ||
| ) |
Converts 1x5 convolution filters to Winograd F(1x4,1x5) domain (or 5x1 filters to F(4x1,5x1) domain).
For every input-output channel pair the function transforms 5 source filter values to 8 Winograd coefficients. If trans is SimdFalse, every source filter is stored contiguously and output coefficients are separated by size. If trans is SimdTrue, every coefficient plane is stored with stride equal to size.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input 32-bit float array with filter weights. [in] size - (number of input channels)*(number of output channels). [out] dst - a pointer to the output 32-bit float array with 8 transformed coefficients per channel pair. [in] trans - a flag of transposed filter layout.
◆ SimdWinogradKernel1x5Block1x4SetInput()
| void SimdWinogradKernel1x5Block1x4SetInput | ( | const float * | src, |
| size_t | srcChannels, | ||
| size_t | srcHeight, | ||
| size_t | srcWidth, | ||
| size_t | padY, | ||
| size_t | padX, | ||
| size_t | padH, | ||
| size_t | padW, | ||
| float * | dst, | ||
| size_t | dstStride, | ||
| SimdBool | trans | ||
| ) |
Converts input tensor rows to Winograd F(1x4,1x5) domain.
The function transforms source rows by horizontal tiles. Every full tile consumes 8 input values and prepares data for 4 output values. Only horizontal zero padding is supported: padY and padH must be 0, padX must be equal to padW, and padX must be 0 or 2. The transformed output width is srcWidth - 4 without padding, or srcWidth with padding. If trans is SimdFalse, source tensor layout is CHW. If trans is SimdTrue, source tensor layout is HWC and transformed channel values are packed contiguously for every tile.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input image. [in] srcChannels - a number of input channels. [in] srcHeight - a height of input image. [in] srcWidth - a width of input image. [in] padY - an additional zero padding of input image at the beginning of Y-axis. [in] padX - an additional zero padding of input image at the beginning of X-axis. [in] padH - an additional zero padding of input image at the end of Y-axis. [in] padW - an additional zero padding of input image at the end of X-axis. [out] dst - a pointer to the output array with converted image tiles. [in] dstStride - a distance between adjacent Winograd coefficients in output array. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel1x5Block1x4SetOutput()
| void SimdWinogradKernel1x5Block1x4SetOutput | ( | const float * | src, |
| size_t | srcStride, | ||
| float * | dst, | ||
| size_t | dstChannels, | ||
| size_t | dstHeight, | ||
| size_t | dstWidth, | ||
| SimdBool | trans | ||
| ) |
Converts output tensor rows from Winograd F(1x4,1x5) domain.
Every tile contains 8 Winograd coefficients and produces up to 4 output values. Tail tiles are clipped to dstWidth. If trans is SimdFalse, destination tensor layout is CHW. If trans is SimdTrue, destination tensor layout is HWC.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input array with converted image tiles. [in] srcStride - a distance between adjacent Winograd coefficients in input array. [out] dst - a pointer to the output image. [in] dstChannels - a number of output channels. [in] dstHeight - a height of output image. [in] dstWidth - a width of output image. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel2x2Block2x2SetFilter()
| void SimdWinogradKernel2x2Block2x2SetFilter | ( | const float * | src, |
| size_t | size, | ||
| float * | dst, | ||
| SimdBool | trans | ||
| ) |
Converts 2x2 convolution filters to Winograd F(2x2,2x2) domain.
For every input-output channel pair the function transforms 4 source filter values to 9 Winograd coefficients arranged as a 3x3 transformed filter. If trans is SimdFalse, every source filter is stored contiguously and output coefficients are separated by size. If trans is SimdTrue, every coefficient plane is stored with stride equal to size.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input 32-bit float array with filter weights. [in] size - (number of input channels)*(number of output channels). [out] dst - a pointer to the output 32-bit float array with 9 transformed coefficients per channel pair. [in] trans - a flag of transposed filter layout.
◆ SimdWinogradKernel2x2Block2x2SetInput()
| void SimdWinogradKernel2x2Block2x2SetInput | ( | const float * | src, |
| size_t | srcChannels, | ||
| size_t | srcHeight, | ||
| size_t | srcWidth, | ||
| size_t | padY, | ||
| size_t | padX, | ||
| size_t | padH, | ||
| size_t | padW, | ||
| float * | dst, | ||
| size_t | dstStride, | ||
| SimdBool | trans | ||
| ) |
Converts input tensor tiles to Winograd F(2x2,2x2) domain.
Every full tile consumes a 3x3 input patch and prepares data for a 2x2 output block. Padding must satisfy padY == padX, padW == padH, and padY + padH is 0 or 1. The transformed output size is (srcHeight - 1 + padY + padH) x (srcWidth - 1 + padX + padW). If trans is SimdFalse, source tensor layout is CHW. If trans is SimdTrue, source tensor layout is HWC and transformed channel values are packed contiguously for every tile.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input image. [in] srcChannels - a number of input channels. [in] srcHeight - a height of input image. [in] srcWidth - a width of input image. [in] padY - an additional zero padding of input image at the beginning of Y-axis. [in] padX - an additional zero padding of input image at the beginning of X-axis. [in] padH - an additional zero padding of input image at the end of Y-axis. [in] padW - an additional zero padding of input image at the end of X-axis. [out] dst - a pointer to the output array with converted image tiles. [in] dstStride - a distance between adjacent Winograd coefficients in output array. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel2x2Block2x2SetOutput()
| void SimdWinogradKernel2x2Block2x2SetOutput | ( | const float * | src, |
| size_t | srcStride, | ||
| float * | dst, | ||
| size_t | dstChannels, | ||
| size_t | dstHeight, | ||
| size_t | dstWidth, | ||
| SimdBool | trans | ||
| ) |
Converts output tensor tiles from Winograd F(2x2,2x2) domain.
Every tile contains a 3x3 set of Winograd coefficients and produces up to a 2x2 output block. Tail blocks are clipped to dstHeight and dstWidth. If trans is SimdFalse, destination tensor layout is CHW. If trans is SimdTrue, destination tensor layout is HWC.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input array with converted image tiles. [in] srcStride - a distance between adjacent Winograd coefficients in input array. [out] dst - a pointer to the output image. [in] dstChannels - a number of output channels. [in] dstHeight - a height of output image. [in] dstWidth - a width of output image. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel2x2Block4x4SetFilter()
| void SimdWinogradKernel2x2Block4x4SetFilter | ( | const float * | src, |
| size_t | size, | ||
| float * | dst, | ||
| SimdBool | trans | ||
| ) |
Converts 2x2 convolution filters to Winograd F(4x4,2x2) domain.
For every input-output channel pair the function transforms 4 source filter values to 25 Winograd coefficients arranged as a 5x5 transformed filter. If trans is SimdFalse, every source filter is stored contiguously and output coefficients are separated by size. If trans is SimdTrue, every coefficient plane is stored with stride equal to size.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input 32-bit float array with filter weights. [in] size - (number of input channels)*(number of output channels). [out] dst - a pointer to the output 32-bit float array with 25 transformed coefficients per channel pair. [in] trans - a flag of transposed filter layout.
◆ SimdWinogradKernel2x2Block4x4SetInput()
| void SimdWinogradKernel2x2Block4x4SetInput | ( | const float * | src, |
| size_t | srcChannels, | ||
| size_t | srcHeight, | ||
| size_t | srcWidth, | ||
| size_t | padY, | ||
| size_t | padX, | ||
| size_t | padH, | ||
| size_t | padW, | ||
| float * | dst, | ||
| size_t | dstStride, | ||
| SimdBool | trans | ||
| ) |
Converts input tensor tiles to Winograd F(4x4,2x2) domain.
Every full tile consumes a 5x5 input patch and prepares data for a 4x4 output block. Padding must satisfy padY == padX, padW == padH, and padY + padH is 0 or 1. The transformed output size is (srcHeight - 1 + padY + padH) x (srcWidth - 1 + padX + padW). If trans is SimdFalse, source tensor layout is CHW. If trans is SimdTrue, source tensor layout is HWC and transformed channel values are packed contiguously for every tile.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input image. [in] srcChannels - a number of input channels. [in] srcHeight - a height of input image. [in] srcWidth - a width of input image. [in] padY - an additional zero padding of input image at the beginning of Y-axis. [in] padX - an additional zero padding of input image at the beginning of X-axis. [in] padH - an additional zero padding of input image at the end of Y-axis. [in] padW - an additional zero padding of input image at the end of X-axis. [out] dst - a pointer to the output array with converted image tiles. [in] dstStride - a distance between adjacent Winograd coefficients in output array. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel2x2Block4x4SetOutput()
| void SimdWinogradKernel2x2Block4x4SetOutput | ( | const float * | src, |
| size_t | srcStride, | ||
| float * | dst, | ||
| size_t | dstChannels, | ||
| size_t | dstHeight, | ||
| size_t | dstWidth, | ||
| SimdBool | trans | ||
| ) |
Converts output tensor tiles from Winograd F(4x4,2x2) domain.
Every tile contains a 5x5 set of Winograd coefficients and produces up to a 4x4 output block. Tail blocks are clipped to dstHeight and dstWidth. If trans is SimdFalse, destination tensor layout is CHW. If trans is SimdTrue, destination tensor layout is HWC.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input array with converted image tiles. [in] srcStride - a distance between adjacent Winograd coefficients in input array. [out] dst - a pointer to the output image. [in] dstChannels - a number of output channels. [in] dstHeight - a height of output image. [in] dstWidth - a width of output image. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel3x3Block2x2SetFilter()
| void SimdWinogradKernel3x3Block2x2SetFilter | ( | const float * | src, |
| size_t | size, | ||
| float * | dst, | ||
| SimdBool | trans | ||
| ) |
Converts 3x3 convolution filters to Winograd F(2x2,3x3) domain.
For every input-output channel pair the function transforms 9 source filter values to 16 Winograd coefficients arranged as a 4x4 transformed filter. If trans is SimdFalse, every source filter is stored contiguously and output coefficients are separated by size. If trans is SimdTrue, every coefficient plane is stored with stride equal to size.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input 32-bit float array with filter weights. [in] size - (number of input channels)*(number of output channels). [out] dst - a pointer to the output 32-bit float array with 16 transformed coefficients per channel pair. [in] trans - a flag of transposed filter layout.
◆ SimdWinogradKernel3x3Block2x2SetInput()
| void SimdWinogradKernel3x3Block2x2SetInput | ( | const float * | src, |
| size_t | srcChannels, | ||
| size_t | srcHeight, | ||
| size_t | srcWidth, | ||
| size_t | padY, | ||
| size_t | padX, | ||
| size_t | padH, | ||
| size_t | padW, | ||
| float * | dst, | ||
| size_t | dstStride, | ||
| SimdBool | trans | ||
| ) |
Converts input tensor tiles to Winograd F(2x2,3x3) domain.
Every full tile consumes a 4x4 input patch and prepares data for a 2x2 output block. Padding must be equal on all sides and must be 0 or 1. The transformed output size is srcHeight x srcWidth with padding, or (srcHeight - 2) x (srcWidth - 2) without padding. If trans is SimdFalse, source tensor layout is CHW. If trans is SimdTrue, source tensor layout is HWC and transformed channel values are packed contiguously for every tile.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input image. [in] srcChannels - a number of input channels. [in] srcHeight - a height of input image. [in] srcWidth - a width of input image. [in] padY - an additional zero padding of input image at the beginning of Y-axis. [in] padX - an additional zero padding of input image at the beginning of X-axis. [in] padH - an additional zero padding of input image at the end of Y-axis. [in] padW - an additional zero padding of input image at the end of X-axis. [out] dst - a pointer to the output array with converted image tiles. [in] dstStride - a distance between adjacent Winograd coefficients in output array. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel3x3Block2x2SetOutput()
| void SimdWinogradKernel3x3Block2x2SetOutput | ( | const float * | src, |
| size_t | srcStride, | ||
| float * | dst, | ||
| size_t | dstChannels, | ||
| size_t | dstHeight, | ||
| size_t | dstWidth, | ||
| SimdBool | trans | ||
| ) |
Converts output tensor tiles from Winograd F(2x2,3x3) domain.
Every tile contains a 4x4 set of Winograd coefficients and produces up to a 2x2 output block. Tail blocks are clipped to dstHeight and dstWidth. If trans is SimdFalse, destination tensor layout is CHW. If trans is SimdTrue, destination tensor layout is HWC.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input array with converted image tiles. [in] srcStride - a distance between adjacent Winograd coefficients in input array. [out] dst - a pointer to the output image. [in] dstChannels - a number of output channels. [in] dstHeight - a height of output image. [in] dstWidth - a width of output image. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel3x3Block3x3SetFilter()
| void SimdWinogradKernel3x3Block3x3SetFilter | ( | const float * | src, |
| size_t | size, | ||
| float * | dst, | ||
| SimdBool | trans | ||
| ) |
Converts 3x3 convolution filters to Winograd F(3x3,3x3) domain.
For every input-output channel pair the function transforms 9 source filter values to 25 Winograd coefficients arranged as a 5x5 transformed filter. If trans is SimdFalse, every source filter is stored contiguously and output coefficients are separated by size. If trans is SimdTrue, every coefficient plane is stored with stride equal to size.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input 32-bit float array with filter weights. [in] size - (number of input channels)*(number of output channels). [out] dst - a pointer to the output 32-bit float array with 25 transformed coefficients per channel pair. [in] trans - a flag of transposed filter layout.
◆ SimdWinogradKernel3x3Block3x3SetInput()
| void SimdWinogradKernel3x3Block3x3SetInput | ( | const float * | src, |
| size_t | srcChannels, | ||
| size_t | srcHeight, | ||
| size_t | srcWidth, | ||
| size_t | padY, | ||
| size_t | padX, | ||
| size_t | padH, | ||
| size_t | padW, | ||
| float * | dst, | ||
| size_t | dstStride, | ||
| SimdBool | trans | ||
| ) |
Converts input tensor tiles to Winograd F(3x3,3x3) domain.
Every full tile consumes a 5x5 input patch and prepares data for a 3x3 output block. Padding must be equal on all sides and must be 0 or 1. The transformed output size is srcHeight x srcWidth with padding, or (srcHeight - 2) x (srcWidth - 2) without padding. If trans is SimdFalse, source tensor layout is CHW. If trans is SimdTrue, source tensor layout is HWC and transformed channel values are packed contiguously for every tile.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input image. [in] srcChannels - a number of input channels. [in] srcHeight - a height of input image. [in] srcWidth - a width of input image. [in] padY - an additional zero padding of input image at the beginning of Y-axis. [in] padX - an additional zero padding of input image at the beginning of X-axis. [in] padH - an additional zero padding of input image at the end of Y-axis. [in] padW - an additional zero padding of input image at the end of X-axis. [out] dst - a pointer to the output array with converted image tiles. [in] dstStride - a distance between adjacent Winograd coefficients in output array. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel3x3Block3x3SetOutput()
| void SimdWinogradKernel3x3Block3x3SetOutput | ( | const float * | src, |
| size_t | srcStride, | ||
| float * | dst, | ||
| size_t | dstChannels, | ||
| size_t | dstHeight, | ||
| size_t | dstWidth, | ||
| SimdBool | trans | ||
| ) |
Converts output tensor tiles from Winograd F(3x3,3x3) domain.
Every tile contains a 5x5 set of Winograd coefficients and produces up to a 3x3 output block. Tail blocks are clipped to dstHeight and dstWidth. If trans is SimdFalse, destination tensor layout is CHW. If trans is SimdTrue, destination tensor layout is HWC.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input array with converted image tiles. [in] srcStride - a distance between adjacent Winograd coefficients in input array. [out] dst - a pointer to the output image. [in] dstChannels - a number of output channels. [in] dstHeight - a height of output image. [in] dstWidth - a width of output image. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel3x3Block4x4SetFilter()
| void SimdWinogradKernel3x3Block4x4SetFilter | ( | const float * | src, |
| size_t | size, | ||
| float * | dst, | ||
| SimdBool | trans | ||
| ) |
Converts 3x3 convolution filters to Winograd F(4x4,3x3) domain.
For every input-output channel pair the function transforms 9 source filter values to 36 Winograd coefficients arranged as a 6x6 transformed filter. If trans is SimdFalse, every source filter is stored contiguously and output coefficients are separated by size. If trans is SimdTrue, every coefficient plane is stored with stride equal to size.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input 32-bit float array with filter weights. [in] size - (number of input channels)*(number of output channels). [out] dst - a pointer to the output 32-bit float array with 36 transformed coefficients per channel pair. [in] trans - a flag of transposed filter layout.
◆ SimdWinogradKernel3x3Block4x4SetInput()
| void SimdWinogradKernel3x3Block4x4SetInput | ( | const float * | src, |
| size_t | srcChannels, | ||
| size_t | srcHeight, | ||
| size_t | srcWidth, | ||
| size_t | padY, | ||
| size_t | padX, | ||
| size_t | padH, | ||
| size_t | padW, | ||
| float * | dst, | ||
| size_t | dstStride, | ||
| SimdBool | trans | ||
| ) |
Converts input tensor tiles to Winograd F(4x4,3x3) domain.
Every full tile consumes a 6x6 input patch and prepares data for a 4x4 output block. Padding must satisfy padY + padH <= 2 and padX + padW <= 2. The transformed output size is (srcHeight - 2 + padY + padH) x (srcWidth - 2 + padX + padW). If trans is SimdFalse, source tensor layout is CHW. If trans is SimdTrue, source tensor layout is HWC and transformed channel values are packed contiguously for every tile.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input image. [in] srcChannels - a number of input channels. [in] srcHeight - a height of input image. [in] srcWidth - a width of input image. [in] padY - an additional zero padding of input image at the beginning of Y-axis. [in] padX - an additional zero padding of input image at the beginning of X-axis. [in] padH - an additional zero padding of input image at the end of Y-axis. [in] padW - an additional zero padding of input image at the end of X-axis. [out] dst - a pointer to the output array with converted image tiles. [in] dstStride - a distance between adjacent Winograd coefficients in output array. [in] trans - a flag of transposed tensor layout.
◆ SimdWinogradKernel3x3Block4x4SetOutput()
| void SimdWinogradKernel3x3Block4x4SetOutput | ( | const float * | src, |
| size_t | srcStride, | ||
| float * | dst, | ||
| size_t | dstChannels, | ||
| size_t | dstHeight, | ||
| size_t | dstWidth, | ||
| SimdBool | trans | ||
| ) |
Converts output tensor tiles from Winograd F(4x4,3x3) domain.
Every tile contains a 6x6 set of Winograd coefficients and produces up to a 4x4 output block. Tail blocks are clipped to dstHeight and dstWidth. If trans is SimdFalse, destination tensor layout is CHW. If trans is SimdTrue, destination tensor layout is HWC.
- Note
- This function is used in Synet Framework.
- Parameters
-
[in] src - a pointer to the input array with converted image tiles. [in] srcStride - a distance between adjacent Winograd coefficients in input array. [out] dst - a pointer to the output image. [in] dstChannels - a number of output channels. [in] dstHeight - a height of output image. [in] dstWidth - a width of output image. [in] trans - a flag of transposed tensor layout.