Simd Library Release Notes (2020).

Home | Release Notes | Download | Documentation | Issues | GitHub

2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013

December 1, 2020 (version 4.6.96)

Algorithms

New features
  • Base implementation of function AveragingBinarizationV2.
  • SSE4.1, AVX2, AVX-512BW optimizations of function AlphaUnpremultiply.
Improving
  • SSE2, AVX2, AVX-512BW and NEON optimizations of function MedianFilterSquare5x5.
  • SSE2, AVX2, AVX-512F optimizations of function SynetSoftmaxLayerForward.
  • Reducing of number of calling function CpuSocketNumber at initialization of Simd.
  • Reducing of number of calling function CpuCoreNumber at initialization of Simd.
  • Reducing of number of calling function CheckBit at initialization of Simd.
Bug fixing
  • Compilation error in file SimdNeonSynetConvolution8i.cpp.
  • Infinite loop in SynetConvolution32fNhwcDirect::OldReorderWeight (on Celeron CPU).
  • Crash in SimdRuntime.h (on Celeron CPU).
  • Crash in SimdGemm.h (on Celeron CPU).
  • Function SimdSynetSpecifyTensorFormat returns incorrect value.

Test framework

New features
  • Tests for verifying functionality of function AveragingBinarizationV2.
  • Parameter '-lc' to litter CPU cache between tests run.

Infrastructure

New features
  • MSVS projects can be used from external solution.
Removing
  • Supporting of MSA(MIPS).
Home

November 4, 2020 (version 4.6.95)

Algorithms

New features
  • AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iCdc class.
  • AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iCd class.
  • AVX2, AVX-512BW and AVX-512VNNI optimizations of SynetMergedConvolution8iDc class.
  • SSE4.1, AVX2, AVX-512BW optimizations of function SynetConvert8uTo32f.
  • Base implementation, SSE2, SSSE3 AVX2, AVX-512BW optimizations of function AlphaPremultiply.
  • Base implementation of function AlphaUnpremultiply.
Bug fixing
  • GCC v10 compilation error in file SimdGemm.h.
  • Error in IECompatible method of SynetMergedConvolution8i.

Test framework

New features
  • Tests for verifying functionality of function AlphaPremultiply.
  • Tests for verifying functionality of function AlphaUnpremultiply.

Documentation

Bug fixing
  • There are no references to C++ wrappers in description of API functions.
Home

October 1, 2020 (version 4.6.94)

Algorithms

New features
  • Base implementation of SynetMergedConvolution8i class.
  • Base implementation of function SynetConvert8uTo32f.
  • Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iCdc class.
  • Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iCd class.
  • Base implementation and SSE4.1 optimizations of SynetMergedConvolution8iDc class.
Bug fixing
  • Performance degradation in class Convolution32fNhwcDirect (weights size >> L3 cache).
  • Performance degradation in class Convolution32fGemmNN (weights size >> L3 cache).

Test framework

New features
  • Tests for verifying functionality of SynetMergedConvolution8i class.
  • Tests for verifying functionality of function SynetConvert8uTo32f.

Documentation

Improving
  • Improve structuring of Synet documentation.
Home

September 1, 2020 (version 4.6.93)

Algorithms

New features
  • Full support of SimdConvolutionActivationType in SynetConvolution8i class.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8iNhwcDepthwise class.
  • Extend class MergedConvolution32f (2 merged convolutions).
  • Base implementation, SSE2, AVX, AVX2, AVX-512F optimizations of MergedConvolution32fCd class.
  • Base implementation, SSE2, AVX, AVX2, AVX-512F optimizations of MergedConvolution32fDc class.
Improving
  • Reducing of compilation time and assembled size of Simd Library.
Renaming
  • Class MergedConvolution32f to MergedConvolution32fCdc.
Bug fixing
  • Performance degradation in class Convolution32fNhwcDirect (dilation != 1).

Test framework

New features
  • Tests for verifying functionality of class MergedConvolution32f (2 merged convolutions).
Home

August 3, 2020 (version 4.6.92)

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetAdd8i.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function SynetInnerProduct8i.
Improving
  • Reducing of compilation time and assembled size of Simd Library.
Bug fixing
  • Error in SSE4.1, AVX2, AVX-512BW optimizations of SynetScale8i class (wrong alignment check).
  • Error in performance annotation of SynetConvolution8i class.
  • Compiler error in file SimdBaseSynetConvolution8i.cpp (for old compilers).
  • Compiler errors in files SimdAvx2Synet.cpp, SimdAvx2SynetScale.cpp (WIN32, MSVS).

Test framework

New features
  • Tests for verifying functionality of function SynetAdd8i.
  • Tests for verifying functionality of function SynetInnerProduct8i.
Home

July 1, 2020 (version 4.6.91)

Algorithms

New features
  • Extend SimdSynetCompatibilityType enumeration.
  • Add support of SimdSynetCompatibility8iNarrowed to Base implementation, SSE2, AVX2, AVX-512BW and NEON optimizations of function SynetConvert32fTo8u.
  • Add support of SimdSynetCompatibility8iNarrowed to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
  • Add support of SimdConvolutionActivationPrelu to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of SynetScale8i class.
Improving
  • Reducing of size of applications or shared libraries which use Simd as static library.
Bug fixing
  • Error in class SynetConvolution8i (batch > 1).

Test framework

New features
  • Tests for verifying functionality of SynetScale8i framework.
Home

June 3, 2020 (version 4.6.90)

Algorithms

New features
  • Rgb24 format in Frame structure.
  • Rgb24 format in Convert function.
  • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function RgbToGray.
  • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function RgbToBgra.
  • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function BgraToRgb.
  • AVX2 optimization of function BgraToBgr.
  • Function LitterCpuCache.
  • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function Yuv444pToRgb.
  • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function Yuv422pToRgb.
  • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function Yuv420pToRgb.
Improving
  • NEON optimization of function BgrToGray.
Bug fixing
  • Error in class SynetConvolution8i (group != 1).
  • Wrong assert condition in SSE2, AVX, AVX2, AVX-512F and NEON optimization of class Convolution32fNhwcDirect.
  • Compiler error when SIMD_AVX2_DISABLE macro is uncommented.
  • Int32 overflow in function SynetConvolution8i::SetParams.

Test framework

New features
  • Tests for verifying functionality of function RgbToGray.
  • Tests for verifying functionality of function RgbToBgra.
  • Tests for verifying functionality of function BgraToRgb.
  • Tests for verifying functionality of function Yuv444pToRgb.
  • Tests for verifying functionality of function Yuv422pToRgb.
  • Tests for verifying functionality of function Yuv420pToRgb.
Home

May 4, 2020 (version 4.6.89)

Algorithms

Bug fixing
  • Microsoft Visual Studio 2013 compiler errors in files: SimdSynetConvolution8i.h, SimdSse2SynetConvolution32f.cpp, SimdAvx2Reduce.cpp.
  • Buffer overrun in SSE4.1, AVX2, NEON optimizations of SynetConvolution8iNhwcDirect class.
  • Visual Studio 2017 internal compiler error in function Avx512f::ConvolutionBiasAndActivation (Win32/Release).
  • Compiler error in NEON optimization of class SynetConvolution8iNhwcDirect (ARM, 32-bit).
  • Error in AVX2 optimization of function SynetScaleLayerForward.
  • Error in base implementation of SquaredDifferenceKahanSum32f (Visual Studio 2017).
  • Error in AVX-512BW optimization of class SynetConvolution8iNhwcDirect (Visual Studio 2017/2019, Release).
  • Error in class SynetConvolution32fNhwcDirect (large parameters srcC and dstC).

Test framework

Bug fixing
  • Microsoft Visual Studio 2013 compiler errors in files: TestTensor.h, TestSynetActivation.cpp.
  • Test report is not generated if output directory is not exists.
  • Error in test SynetConvert32fTo8uAutoTest.

Infrastructure

New features
  • Script to test Simd compiled with different version of Microsoft Visual Studio.
  • New structure of Microsoft Visual Studio 2019 project files.
Removing
  • Remove project files of Microsoft Visual Studio 2012.
Home

April 1, 2020 (version 4.6.88)

Algorithms

New features
  • AVX-512VNNI extension support.
  • AVX2, AVX-512BW, AVX-512VNNI and NEON optimizations of SynetConvolution8iNhwcDirect class.
  • Base implementation and SSE4.1, AVX2 AVX-512BW and NEON optimizations of function SynetPoolingForwardMax8u.
Renaming
  • SynetPoolingForwardMax to SynetPoolingForwardMax32f.
Improving
  • SSE4.1 optimization of SynetConvolution8iNhwcDirect class.
  • SSE2, AVX, AVX2, AVX-512F and NEON optimizations of SynetConvolution32fNhwcDirect class.
Bug fixing
  • Microsoft Visual Studio 2015 compiler error in function SynetConvert32fTo8u.
  • Degradation of performance of AVX2 code.
  • Microsoft Visual Studio compiler error in function Extract64i (32-bit mode).

Test framework

New features
  • Tests for verifying functionality of function SynetPoolingForwardMax8u.
Home

March 2, 2020 (version 4.5.87)

Algorithms

New features
  • Add parameter of bitwise compatibility of function SynetScaleLayerForward and Inference Engine.
  • Add parameter 'type' to function SynetShuffleLayerForward.
  • Base implementation, SSE2, AVX2, AVX-512BW amd NEON optimizations of function SynetConvert32fTo8u.
  • SimdSynetCompatibilityType enumeration.
  • Base implementation of SynetConvolution8iGemmNN class.
  • Base implementation and SSE4.1 optimization of SynetConvolution8iNhwcDirect class.
Renaming
  • SimdSynetConvertImage to SimdSynetReorderImage.
  • SimdSynetConvertFilter to SimdSynetReorderFilter.

Test framework

New features
  • A new commandline test parameter -c - a number of channels in test image for performance testing.
  • A new commandline test parameter -mt - a minimal test execution time (in milliseconds).
  • Tests for verifying functionality of SynetConvolution8i framework.
  • Tests for verifying functionality of function SynetConvert32fTo8u.

Documentation

Bug fixing
  • Error in description of method Detection::LoadStringXml.
Home

February 3, 2020 (version 4.5.86)

Algorithms

New features
  • SimdResizeMethodInferenceEngineInterp method in Resizer framework.
Improving
  • Performance of Convolution32f framework (NHWC format, kernel=3x3, stride=1x1, large H and W).
  • Performance of AVX-512F and NEON optimizations of function GemmPackA.
  • Performance of Convolution32f framework (NHWC format, GemmNN method).
  • Performance of SSE2, AVX, AVX2, AVX-512F and NEON optimizations of Convolution32f framework (NHWC format, NhwcDirect method, kernel=1x1).
  • Performance of AVX-512F optimization of MergedConvolution32f framework (input convolution).
  • Performance of AVX2 and AVX-512F optimizations of MergedConvolution32f framework (output convolution).
  • Performance of Convolution32f framework (stride > 1).
  • Performance of AVX-512F optimization of Gemm32fNN function (add 6x64 and 6x48 micro kernel).
Bug fixing
  • Error in AVX-512F optimization of function WinogradKernel3x3Block2x2SetOutput (NCHW format).
  • Error in SSE, AVX, AVX-512F and NEON optimizations of function SynetPoolingForwardAverage (NHWC format).
  • Error in AVX-512F optimization of function SynetInnerProductLayerForward.
  • Error in AVX, AVX2 and AVX-512F optimizations of function Gemm32fNT.
  • Error in function WinogradKernel3x3Block4x4SetInput (padX != padY != padW != padH).
  • Error in debug FLOPS annotation of Deconvolution32f framework.
  • MergedConvolution32f framework doesn't work with stride == 3.
Home

January 3, 2020 (version 4.5.85)

Algorithms

New features
  • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetUnaryOperation32fLayerForward.
  • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetSoftplus32f.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block2x2SetFilter.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block2x2SetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block2x2SetOutput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block4x4SetFilter.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block4x4SetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel2x2Block4x4SetOutput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x3Block1x4SetFilter.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x3Block1x4SetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x3Block1x4SetOutput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x5Block1x4SetFilter.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x5Block1x4SetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function WinogradKernel1x5Block1x4SetOutput.
Improving
  • Performance of Convolution32f framework (NHWC format, kernel=1x1x1).
  • Performance of Convolution32f framework (NHWC format, kernel=2x2).
  • Performance of Convolution32f framework (NHWC format, kernel=1x3).
  • Performance of Convolution32f framework (NHWC format, kernel=1x5).
Renaming
  • NeuralSigmoid to SynetSigmoid32f.
  • NeuralTanh to SynetTanh32f.
  • NeuralRelu to SynetRelu32f.
  • Winograd2x3SetFilter to WinogradKernel3x3Block2x2SetFilter.
  • Winograd2x3SetInput to WinogradKernel3x3Block2x2SetInput.
  • Winograd2x3SetOutput to WinogradKernel3x3Block2x2SetOutput.
  • Winograd3x3SetFilter to WinogradKernel3x3Block3x3SetFilter.
  • Winograd3x3SetInput to WinogradKernel3x3Block3x3SetInput.
  • Winograd3x3SetOutput to WinogradKernel3x3Block3x3SetOutput.
  • Winograd4x4SetFilter to WinogradKernel3x3Block4x4SetFilter.
  • Winograd4x4SetInput to WinogradKernel3x3Block4x4SetInput.
  • Winograd4x4SetOutput to WinogradKernel3x3Block4x4SetOutput.
Bug fixing
  • Error in Convolution32f framework (kernel greater than input size, NHWC format).
  • Potential crash in ContourDetector.

Test framework

New features
  • Tests for verifying functionality of function SynetUnaryOperation32fLayerForward.
  • Tests for verifying functionality of function SynetSoftplus32f.
  • Tests for verifying functionality of function WinogradKernel2x2Block2x2SetFilter.
  • Tests for verifying functionality of function WinogradKernel2x2Block2x2SetInput.
  • Tests for verifying functionality of function WinogradKernel2x2Block2x2SetOutput.
  • Tests for verifying functionality of function WinogradKernel2x2Block4x4SetFilter.
  • Tests for verifying functionality of function WinogradKernel2x2Block4x4SetInput.
  • Tests for verifying functionality of function WinogradKernel2x2Block4x4SetOutput.
  • Tests for verifying functionality of function WinogradKernel1x3Block1x4SetFilter.
  • Tests for verifying functionality of function WinogradKernel1x3Block1x4SetInput.
  • Tests for verifying functionality of function WinogradKernel1x3Block1x4SetOutput.
  • Tests for verifying functionality of function WinogradKernel1x5Block1x4SetFilter.
  • Tests for verifying functionality of function WinogradKernel1x5Block1x4SetInput.
  • Tests for verifying functionality of function WinogradKernel1x5Block1x4SetOutput.
Home
2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013