Simd Library Release Notes (2021).

2026 | 2025 | 2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013

December 1, 2021 (version 4.9.108)

Algorithms

New features

SSE4.1, AVX2, AVX-512F, AVX-512BW optimizations of class ResizerNearest.
Add SimdResizeMethodNearestPytorch to SimdResizeMethodType enumeration.
Add parameter BackgroundStatUpdateTime to Motion Detector.
MotionDetector performance optimization (case of falling star).
16-bit UYVY image format in View.
Base implementation of function UyvyToBgr.
Base implementation, SSE2, AVX2, AVX-512F optimizations of function SynetSwish32f.
SimdConvolutionActivationSwish item of SimdConvolutionActivationType enumeration.
Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetConvolution32f framework.
Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetDeconvolution32f framework.
Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetMergedConvolution32f framework.
Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8i framework.
Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
SimdYuvType enumeration.
Base implementation, SSE2, AVX2, AVX-512BW optimizations of function Yuv444pToBgraV2.
Function Simd::Resize supports images with 16-bit channel size.
Base implementation function Yuv420pToBgraV2.

Improving

Refactoring of SimdResizeMethodType enumeration.

Bug fixing

Stack corruption in function Simd::Avx2::JpegWriteBlockSubs.

Test framework

New features

Tests for verifying functionality of function UyvyToBgr.
Tests for verifying functionality of function SynetSwish32f.
Tests for verifying functionality of function Yuv444pToBgraV2.
Tests for verifying functionality of function Yuv420pToBgraV2.

Infrastructure

Bug fixing

Wrong compiler options correction in Cmake.

Home

November 1, 2021 (version 4.9.107)

Algorithms

New features

Internal class Holder to replace std::unique_ptr for old compilers without support of C++11 standard.
SimdBayerLayoutType enumeration.
Base implementation of class ResizerNearest.

Bug fixing

Compiler error when defined macro SIMD_SSE2_DISABLE.
Compiler error when defined macro SIMD_NEON_DISABLE.

Infrastructure

New features

SIMD_ROOT Cmake parameter.

Home

October 1, 2021 (version 4.9.106)

Algorithms

New features

Base implementation, SSE2, AVX, AVX-512F, NEON optimizations of function SynetHardSigmoid32f.
SimdConvolutionActivationHardSigmoid item of SimdConvolutionActivationType enumeration.
HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetConvolution32f framework.
HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetDeconvolution32f framework.
HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetMergedConvolution32f framework.
NEON optimizations of SynetMergedConvolution32fDc class.
NEON optimizations of SynetMergedConvolution32fCd class.
NEON optimizations of SynetInnerProduct32fGemm class.
NEON optimizations of SynetInnerProduct32fProd class.
HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI, NEON optimizations of SynetConvolution8i framework.
HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.

Bug fixing

Compiler error in file SimdInit.h (CLang, Windows).

Removing

Remove including SimdConfig.h in SimdLib.h.

Test framework

New features

Tests for verifying functionality of function SynetHardSigmoid32f.
'-pi' test parameter (to print internal performance statistics of Simd Library to console).

Home

September 13, 2021 (version 4.9.105)

Algorithms

New features

AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24 for Rotate180, TransposeRotate90).
Method Frame::Clone with region parameter.
Method View::Clone with region parameter.
AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
AVX-512BW optimizations of function TransformImage (case of Gray8, Uv16, Bgra32 for Rotate180, TransposeRotate90).
AVX-512BW optimizations of function TransformImage (case of Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
AVX-512BW optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
AVX-512BW optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
Base implementation, SSE2, AVX2, AVX-512BW, NEON optimizations of function AlphaBlendingUniform.
AVX-512BW optimizations of function TransformImage (case of Bgr24 for Rotate180, TransposeRotate90, Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
Resize function (with size parameter).
Move constructor of View structure.
Move operator of View structure.
Clear method of Frame structure.
Swap method of Frame structure.
Move constructor of Frame structure.
Move operator of Frame structure.

Tests

New features

Tests for verifying functionality of function AlphaBlendingUniform.

Home

August 3, 2021 (version 4.9.104)

Algorithms

New features

Rgba32 format in Frame structure.
Rgba32 format in Convert function (for frames).
SSE4.1 optimizations of function Float32ToFloat16.
SSE4.1 optimizations of function Float16ToFloat32.
AVX2 optimizations of function TransformImage (case of Bgra32 for Rotate180, TransposeRotate90).

Improving

SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetConvolution32fNhwcDirect (case of fixed kernels).
Reducing of compilation time and binaries size of class SynetConvolution32f.
Reducing of compilation time and binaries size of class SynetDeconvolution32f.
Reducing of compilation time and binaries size of class SynetMergedConvolution32f.
Reducing of compilation time and binaries size of class SynetConvolution8i.
Reducing of compilation time and binaries size of class SynetMergedConvolution8i.
SSE41 optimizations of function TransformImage (case of Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate180).
SSE41 optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
SSE41 optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).

Bug fixing

Compiler error in file SimdAvx512bwResizer.cpp (GCC 5.4.0).
Compiler error in file SimdAvx512bwBgraToBgr.cpp (MSVS-2017).
Compiler error in file SimdInit.h (CLang, Windows).
Error in AVX2 and AVX-512BW optimizations of functions CosineDistancesMxNa16f and CosineDistancesMxNp16f (functions may return small negative values).
Error in function Base::DetectionLoadA (it generates exception instead of returns NULL).
Error in SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetDeconvolution32fNhwcDirect2x2.

Replacing

Replace SSE3 optimizations to SSE4.1 for function Gemm32fNT.
Replace SSE3 optimizations to SSE4.1 for function SynetConvolution32fInit.
Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution2x2Sum.
Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution3x3Sum.
Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution4x4Sum.
Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution5x5Sum.
Replace SSE3 optimizations to SSE4.1 for function NeuralConvolutionForward.
Replace SSE4.2 optimizations to SSE4.1 for function Crc32c.
Replace SSSE3 optimizations to SSE4.1 for function AlphaBlending.
Replace SSSE3 optimizations to SSE4.1 for function AlphaFilling.
Replace SSSE3 optimizations to SSE4.1 for function AlphaPremultiply.
Replace SSSE3 optimizations to SSE4.1 for function BayerToBgr.
Replace SSSE3 optimizations to SSE4.1 for function BgraToBayer.
Replace SSSE3 optimizations to SSE4.1 for function BgraToBgr.
Replace SSSE3 optimizations to SSE4.1 for function BgraToRgb.
Replace SSSE3 optimizations to SSE4.1 for function BgraToRgba.
Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv420p.
Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv422p.
Replace SSSE3 optimizations to SSE4.1 for function BgraToYuva420p.
Replace SSSE3 optimizations to SSE4.1 for function BgrToBayer.
Replace SSSE3 optimizations to SSE4.1 for function BgrToBgra.
Replace SSSE3 optimizations to SSE4.1 for function RgbToBgra.
Replace SSSE3 optimizations to SSE4.1 for function BgrToGray.
Replace SSSE3 optimizations to SSE4.1 for function RgbToGray.
Replace SSSE3 optimizations to SSE4.1 for function BgrToRgb.
Replace SSSE3 optimizations to SSE4.1 for function TransformImage.
Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv420p.
Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv422p.
Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv444p.
Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgr.
Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgra.
Replace SSSE3 optimizations to SSE4.1 for function GaussianBlur3x3.
Replace SSSE3 optimizations to SSE4.1 for function GrayToBgr.
Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgr.
Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgra.
Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToBgr.
Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToBgr.
Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToBgr.
Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToRgb.
Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToRgb.
Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToRgb.
Replace SSSE3 optimizations to SSE4.1 for function Laplace.
Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbs.
Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbsSum.
Replace SSSE3 optimizations to SSE4.1 for function MeanFilter3x3.
Replace SSSE3 optimizations to SSE4.1 for function ReduceColor2x2.
Replace SSSE3 optimizations to SSE4.1 for function ReduceGray2x2.
Replace SSSE3 optimizations to SSE4.1 for function ReduceGray4x4.
Replace SSSE3 optimizations to SSE4.1 for function Reorder16bit.
Replace SSSE3 optimizations to SSE4.1 for function Reorder32bit.
Replace SSSE3 optimizations to SSE4.1 for function Reorder64bit.
Replace SSSE3 optimizations to SSE4.1 for function ResizeBilinear.
Replace SSSE3 optimizations to SSE4.1 for function SobelDx.
Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbs.
Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbsSum.
Replace SSSE3 optimizations to SSE4.1 for function SobelDy.
Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbs.
Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbsSum.
Replace SSSE3 optimizations to SSE4.1 for function ContourMetrics.
Replace SSSE3 optimizations to SSE4.1 for function ContourMetricsMasked.
Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSum.
Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSumMasked.
Replace SSSE3 optimizations to SSE4.1 for function TextureBoostedSaturatedGradient.
Replace SSSE3 optimizations to SSE4.1 for class ResizerByteBilinear.

Tests

New features

Colorized annotation in console logging.

Improving

Performance report generation to text file.
Thread ID annotation in console logging.

Infrastructure

New features

SIMD_INT8_DEBUG cmake option.

Removing

Separate support of SSE3 extension (it has been moved into SSE4.1).
Separate support of SSE4.2 extension (it has been moved into SSE4.1).
Separate support of SSSE3 extension (it has been moved into SSE4.1).

Home

July 1, 2021 (version 4.8.103)

Algorithms

New features

Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of class ResizerShortBilinear.
Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNa16f.
Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNp16f.
Parameter of ROI mask in Motion::Model.
SSE2, AVX-512BW and NEON optimizations of function AbsDifference.
NEON optimizations of function AlphaUnpremultiply.
NEON optimizations of function AlphaPremultiply.
NEON optimizations of function ValueSquareSums.

Improving

Performance of SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fGemm class.

Bug fixing

Linker warning in file SimdImageLoad.h (MSVS).

Replacing

Replace SSE optimizations to SSE2 for function SvmSumLinear.
Replace SSE optimizations to SSE2 for function Fill32f.
Replace SSE optimizations to SSE2 for function CosineDistance32f.
Replace SSE optimizations to SSE2 for function DifferenceSum32f.
Replace SSE optimizations to SSE2 for function SquaredDifferenceKahanSum32f.
Replace SSE optimizations to SSE2 for function HogDeinterleave.
Replace SSE optimizations to SSE2 for function HogFilterSeparable.
Replace SSE optimizations to SSE2 for class ResizerFloatBilinear.
Replace SSE optimizations to SSE2 for function NeuralAddVectorMultipliedByValue.
Replace SSE optimizations to SSE2 for function NeuralAddVector.
Replace SSE optimizations to SSE2 for function NeuralAddVector.
Replace SSE optimizations to SSE2 for function NeuralAdaptiveGradientUpdate.
Replace SSE optimizations to SSE2 for function NeuralDerivativeRelu.
Replace SSE optimizations to SSE2 for function NeuralDerivativeSigmoid.
Replace SSE optimizations to SSE2 for function NeuralDerivativeTanh.
Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid.
Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid2.
Replace SSE optimizations to SSE2 for function NeuralRoughTanh.
Replace SSE optimizations to SSE2 for function NeuralUpdateWeights.
Replace SSE optimizations to SSE2 for function NeuralPooling1x1Max3x3.
Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max2x2.
Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max3x3.
Replace SSE optimizations to SSE2 for function SynetPoolingForwardAverage.
Replace SSE optimizations to SSE2 for function SynetPoolingForwardMax32f.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Forward.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Forward.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Forward.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Forward.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Backward.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Backward.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Backward.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Backward.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Sum.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Sum.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Sum.
Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Sum.
Replace SSE optimizations to SSE2 for function Gemm32fNN.
Replace SSE optimizations to SSE2 for function SynetFusedLayerForward0.
Replace SSE optimizations to SSE2 for function SynetFusedLayerForward1.
Replace SSE optimizations to SSE2 for function SynetFusedLayerForward2.
Replace SSE optimizations to SSE2 for function SynetFusedLayerForward3.
Replace SSE optimizations to SSE2 for function SynetFusedLayerForward4.
Replace SSE optimizations to SSE2 for function SynetFusedLayerForward8.
Replace SSE optimizations to SSE2 for function SynetFusedLayerForward9.
Replace SSE optimizations to SSE2 for function SynetReorderImage.
Replace SSE optimizations to SSE2 for function SynetReorderFilter.
Replace SSE optimizations to SSE2 for function SynetAddBias.
Replace SSE optimizations to SSE2 for function SynetEltwiseLayerForward.
Replace SSE optimizations to SSE2 for function SynetInnerProductLayerForward.
Replace SSE optimizations to SSE2 for function SynetShuffleLayerForward.
Replace SSE optimizations to SSE2 for function SynetHswish32f.
Replace SSE optimizations to SSE2 for function SynetPreluLayerForward.
Replace SSE optimizations to SSE2 for function SynetRelu32f.
Replace SSE optimizations to SSE2 for function SynetRestrictRange32f.
Replace SSE optimizations to SSE2 for function SynetScaleLayerForward.
Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetFilter.
Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetInput.
Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetOutput.
Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetFilter.
Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetInput.
Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetOutput.
Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetFilter.
Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetInput.
Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetOutput.
Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetFilter.
Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetInput.
Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetOutput.
Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetFilter.
Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetInput.
Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetOutput.
Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetFilter.
Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetInput.
Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetOutput.
Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetFilter.
Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetInput.
Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetOutput.

Tests

New features

Tests to verify functionality function of VectorNormNa16f.
Tests to verify functionality function of VectorNormNp16f.

Infrastructure

Removing

Separate support of SSE extension (it has been moved into SSE2).

Home

June 2, 2021 (version 4.7.102)

Algorithms

New features

Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function ValueSquareSums.

Improving

Performance of AVX2, AVX-512F and NEON optimizations of SynetConvolution32fGemmNN class.
Performance of Neural::FullyConnectedLayer::Forward method.

Bug fixing

Error in class SynetMergedConvolution32fDc (large weights case).
Compiler error in file SimdAvx2SynetConversion.cpp (MSVS-2015, Win32).
Error in SSSE3 optimization of ImageTransform function.
Compiler error in file SimdImageSaveJpeg.h (Clang, Mac mini).
Compiler warnings (Clang).
Error in function ImagePngLoader::ReadTransparency (test tbbn0g04.png).
Error in Base implementation, SSE4.1 optimization of class ImagePngLoader (test basn0g16.png).
Error in SSE4.1 optimization of class ImagePngLoader (test s02i3p01.png).

Tests

New features

Tests to verify functionality function of ValueSquareSums.

Improving

Header of performance report table.

Bug fixing

Compiler error in file TestFile.h (Clang, Mac mini).

Home

May 3, 2021 (version 4.7.101)

Algorithms

New features

Parameter a in function DeinterleaveBgra can be NULL.
Simd::DeinterleaveBgra C++ wrapper.
Simd::DeinterleaveRgb C++ wrapper.
Simd::DeinterleaveRgba C++ wrappers.
Method View::Load (from memory).
Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of ImageJpegSaver class.
Base implementation of ImageJpegLoader class.
Base implementation of ImagePngLoader class.
NEON optimizations of ImagePngSaver class.
SIMD_SYNET_DISABLE macro.
Base implementation, AVX2, AVX-512BW, NEON optimizations of function СosineDistancesMxNp16f.

Bug fixing

Error in NEON optimizations of function СosineDistancesMxNa16f.

Tests

New features

Parameter '-ri' to set real image name in runtime.
Tests to verify functionality function of СosineDistancesMxNp16f.
Special tests for verifying functionality of function ImageLoadFromMemory.

Bug fixing

Error in saving of output log.

Infrastructure

New features

Real images to test encoding/decoding algorithms.
SIMD_SYNET cmake option.
SIMD_HIDE cmake option.

Removing

Project files of Microsoft Visual Studio 2017 (for Android).

Documentation

New features

Description of Cmake parameters.

Home

April 1, 2021 (version 4.6.100)

Algorithms

New features

Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of ImagePngSaver class.
SynetInnerProduct32f framework.
Base implementation, SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fGemm class.
Base implementation, SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fProd class.
Rgba32 format in View structure.
Pixel::Rgba32 structure.
Simd::RgbToBgr C++ wrapper.
Simd::GrayToRgb C++ wrapper.
Simd::GrayToRgba C++ wrapper.
Simd::BgrToRgba C++ wrapper.
Simd::RgbaToRgb C++ wrapper.
Base implementation, SSE2, AVX2, AVX-512BW, NEON optimizations of function RgbaToGray.
Base implementation, SSSE3, AVX2, AVX-512BW, NEON optimizations of function BgraToRgba.
Simd::RgbToRgba C++ wrapper.
Simd::RgbaToBgra C++ wrapper.
Rgba32 format in Convert function.
Rgba32 format in function ImageSave.

Improving

Reduce memory allocations in Simd::ContourDetector.

Bug fixing

Assert in function Avx::SynetMergedConvolution32fCdc::SynetMergedConvolution32fCdc.
Assert in function Avx::SynetMergedConvolution32fCd::SynetMergedConvolution32fCd.
Assert in function Avx::SynetMergedConvolution32fDc::SynetMergedConvolution32fDc.
Freezes in function SynetConvolution32fNhwcDirect::OldReorderWeight (ARMv7 architecture).
Freezes in file SimdGemm.h (ARMv7 architecture).

Tests

New features

Tests for verifying functionality of SynetInnerProduct32f framework.
Performance report use milliseconds or microseconds (choosing in runtime).
Special test to verify functionality function of Simd::Convert.
Tests to verify functionality function of RgbaToGray.
Tests to verify functionality function of BgraToRgba.

Bug fixing

Crash in test BgrToRgbAutoTest.
Error in test of SynetMergedConvolution8i.

Infrastructure

Removing

Remove project files of Microsoft Visual Studio 2013.

Home

March 1, 2021 (version 4.6.99)

Algorithms

New features

SimdImageFileType enumeration.
ImageSaveToFile function.
ImageSaveToMemory function.
Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmTxtSaver class.
Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmBinSaver class.
Change order of parameters in function BgrToRgb.
Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmBinSaver class.
Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmTxtSaver class.
Additional parameters in function View::Save.
Method View::Release.
Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmTxtLoader class.
Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmBinLoader class.
Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmTxtLoader class.
Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmBinLoader class.
Additional parameter in function View::Load.
Base implementation of Crc32 function.

Bug fixing

Crash in Simd::Detection on Python (using of std::unique_ptr).

Tests

New features

Possibility to write output video in UseFaceDetection.cpp example.
Test parameter '-o=' to write annotated output video.
Tests for verifying functionality of function ImageSaveToFile.
Tests for verifying functionality of function ImageSaveToMemory.
Tests for verifying functionality of function ImageLoadFromMemory.
Tests for verifying functionality of function Crc32.

Documentation

New features

Example of use into description of Font.

Bug fixing

Errors in Simd Library description.

Home

February 1, 2021 (version 4.6.98)

Algorithms

New features

Add parameter epsilon to GaussianBlur engine.
Add function SynetConvolution32fInfo.
Add function SynetConvolution8iInfo.
Add function SynetDeconvolution32fInfo.
Add function SynetMergedConvolution32fInfo.
Add function SynetMergedConvolution8iInfo.

Improving

Performance of SynetConvolution8iNhwcDirect class (case of horizontal padding of small image).

Renaming

GaussianBlur engine parameter from radius to sigma.

Bug fixing

Error in GaussianBlur engine (case of small images).
Performance degradation of AVX-512VNNI optimization of SynetConvolution8i framework.
Performance degradation of AVX-512VNNI optimization of SynetMergedConvolution8i framework.
Error in GaussianBlur engine (wrong processing of last rows).
Error in trajectory averaging algorithm in Motion::Detector.

Tests

New features

Possibility to write output video in UseMotionDetector.cpp example.

Bug fixing

Error in files: TestVideo.cpp, UseMotionDetector.cpp, UseFaceDetector.cpp (MSVS-2019, OpenCV enabled).

Documentation

Improving

Description of GaussianBlur engine.
Description of Motion::Detector.

Infrastructure

New feature

Ocv.prop.default for Visual Studio 2019.

Renaming

Cmake parameter from LIBRARY to SIMD_SHARED.
Cmake parameter from CHECK_VERSION to SIMD_GET_VERSION.
Cmake parameter from TOOLCHAIN to SIMD_TOOLCHAIN.
Cmake parameter from TARGET to SIMD_TARGET.

Home

January 4, 2021 (version 4.6.97)

Algorithms

New features

Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetMish32f.
Support of Mish activation function in SynetConvolution32f framework.
Support of Mish activation function in SynetMergedConvolution32f framework.
Support of Mish activation function in SynetConvolution8i framework.
Support of Mish activation function in SynetMergedConvolution8i framework.
Support of Mish activation function in SynetDeconvolution32f framework.
Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of GaussianBlur engine.

Improving

AVX-512F optimization of SynetConvolution32fNhwcDirect class.
AVX-512F optimization of SynetConvolution32fGemmNN class.
AVX-512F optimization of SynetConvolution32fWinograd class.
AVX-512F optimization of function Gemm32fNN.

Bug fixing

Error in Base implementation of SynetMergedConvolution32f (type=CDC, add=1).
Error in function SimdAlignment.
Visual Studio 2017 compiler error in files SimdAvx512bwSynet.cpp, SimdAvx512bwSynetScale.cpp, SimdAvx512bwAlphaBlending.cpp.

Test framework

New features

Tests for verifying functionality of function SynetMish32f.
Tests for verifying functionality of GaussianBlur engine.

Home

2026 | 2025 | 2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013