Simd Library Release Notes (2021).

Home | Release Notes | Download | Documentation | Issues | GitHub

2025 | 2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013

December 1, 2021 (version 4.9.108)

Algorithms

New features
  • SSE4.1, AVX2, AVX-512F, AVX-512BW optimizations of class ResizerNearest.
  • Add SimdResizeMethodNearestPytorch to SimdResizeMethodType enumeration.
  • Add parameter BackgroundStatUpdateTime to Motion Detector.
  • MotionDetector performance optimization (case of falling star).
  • 16-bit UYVY image format in View.
  • Base implementation of function UyvyToBgr.
  • Base implementation, SSE2, AVX2, AVX-512F optimizations of function SynetSwish32f.
  • SimdConvolutionActivationSwish item of SimdConvolutionActivationType enumeration.
  • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetConvolution32f framework.
  • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetDeconvolution32f framework.
  • Swish activation function to Base implementation, SSE2, AVX2, AVX-512F optimizations of SynetMergedConvolution32f framework.
  • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetConvolution8i framework.
  • Swish activation function to Base implementation, SSE4.1, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
  • SimdYuvType enumeration.
  • Base implementation, SSE2, AVX2, AVX-512BW optimizations of function Yuv444pToBgraV2.
  • Function Simd::Resize supports images with 16-bit channel size.
  • Base implementation function Yuv420pToBgraV2.
Improving
  • Refactoring of SimdResizeMethodType enumeration.
Bug fixing
  • Stack corruption in function Simd::Avx2::JpegWriteBlockSubs.

Test framework

New features
  • Tests for verifying functionality of function UyvyToBgr.
  • Tests for verifying functionality of function SynetSwish32f.
  • Tests for verifying functionality of function Yuv444pToBgraV2.
  • Tests for verifying functionality of function Yuv420pToBgraV2.

Infrastructure

Bug fixing
  • Wrong compiler options correction in Cmake.
Home

November 1, 2021 (version 4.9.107)

Algorithms

New features
  • Internal class Holder to replace std::unique_ptr for old compilers without support of C++11 standard.
  • SimdBayerLayoutType enumeration.
  • Base implementation of class ResizerNearest.
Bug fixing
  • Compiler error when defined macro SIMD_SSE2_DISABLE.
  • Compiler error when defined macro SIMD_NEON_DISABLE.

Infrastructure

New features
  • SIMD_ROOT Cmake parameter.
Home

October 1, 2021 (version 4.9.106)

Algorithms

New features
  • Base implementation, SSE2, AVX, AVX-512F, NEON optimizations of function SynetHardSigmoid32f.
  • SimdConvolutionActivationHardSigmoid item of SimdConvolutionActivationType enumeration.
  • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetConvolution32f framework.
  • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetDeconvolution32f framework.
  • HardSigmoid activation function to Base implementation, SSE2, AVX, AVX2, AVX-512F, NEON optimizations of SynetMergedConvolution32f framework.
  • NEON optimizations of SynetMergedConvolution32fDc class.
  • NEON optimizations of SynetMergedConvolution32fCd class.
  • NEON optimizations of SynetInnerProduct32fGemm class.
  • NEON optimizations of SynetInnerProduct32fProd class.
  • HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI, NEON optimizations of SynetConvolution8i framework.
  • HardSigmoid activation function to Base implementation, SSE41, AVX2, AVX-512BW, AVX-512VNNI optimizations of SynetMergedConvolution8i framework.
Bug fixing
  • Compiler error in file SimdInit.h (CLang, Windows).
Removing
  • Remove including SimdConfig.h in SimdLib.h.

Test framework

New features
  • Tests for verifying functionality of function SynetHardSigmoid32f.
  • '-pi' test parameter (to print internal performance statistics of Simd Library to console).
Home

September 13, 2021 (version 4.9.105)

Algorithms

New features
  • AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24 for Rotate180, TransposeRotate90).
  • Method Frame::Clone with region parameter.
  • Method View::Clone with region parameter.
  • AVX2 optimizations of function TransformImage (case of Gray8, Uv16, Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • AVX-512BW optimizations of function TransformImage (case of Gray8, Uv16, Bgra32 for Rotate180, TransposeRotate90).
  • AVX-512BW optimizations of function TransformImage (case of Bgra32 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • AVX-512BW optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • AVX-512BW optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • Base implementation, SSE2, AVX2, AVX-512BW, NEON optimizations of function AlphaBlendingUniform.
  • AVX-512BW optimizations of function TransformImage (case of Bgr24 for Rotate180, TransposeRotate90, Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • Resize function (with size parameter).
  • Move constructor of View structure.
  • Move operator of View structure.
  • Clear method of Frame structure.
  • Swap method of Frame structure.
  • Move constructor of Frame structure.
  • Move operator of Frame structure.

Tests

New features
  • Tests for verifying functionality of function AlphaBlendingUniform.
Home

August 3, 2021 (version 4.9.104)

Algorithms

New features
  • Rgba32 format in Frame structure.
  • Rgba32 format in Convert function (for frames).
  • SSE4.1 optimizations of function Float32ToFloat16.
  • SSE4.1 optimizations of function Float16ToFloat32.
  • AVX2 optimizations of function TransformImage (case of Bgra32 for Rotate180, TransposeRotate90).
Improving
  • SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetConvolution32fNhwcDirect (case of fixed kernels).
  • Reducing of compilation time and binaries size of class SynetConvolution32f.
  • Reducing of compilation time and binaries size of class SynetDeconvolution32f.
  • Reducing of compilation time and binaries size of class SynetMergedConvolution32f.
  • Reducing of compilation time and binaries size of class SynetConvolution8i.
  • Reducing of compilation time and binaries size of class SynetMergedConvolution8i.
  • SSE41 optimizations of function TransformImage (case of Bgr24, Bgra32 for Rotate90, Rotate270, TransposeRotate180).
  • SSE41 optimizations of function TransformImage (case of Uv16 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
  • SSE41 optimizations of function TransformImage (case of Gray8 for Rotate90, Rotate270, TransposeRotate0, TransposeRotate180).
Bug fixing
  • Compiler error in file SimdAvx512bwResizer.cpp (GCC 5.4.0).
  • Compiler error in file SimdAvx512bwBgraToBgr.cpp (MSVS-2017).
  • Compiler error in file SimdInit.h (CLang, Windows).
  • Error in AVX2 and AVX-512BW optimizations of functions CosineDistancesMxNa16f and CosineDistancesMxNp16f (functions may return small negative values).
  • Error in function Base::DetectionLoadA (it generates exception instead of returns NULL).
  • Error in SSE2, AVX, AVX2, AVX-512F and NEON optimizations of class SynetDeconvolution32fNhwcDirect2x2.
Replacing
  • Replace SSE3 optimizations to SSE4.1 for function Gemm32fNT.
  • Replace SSE3 optimizations to SSE4.1 for function SynetConvolution32fInit.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution2x2Sum.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution3x3Sum.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution4x4Sum.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralAddConvolution5x5Sum.
  • Replace SSE3 optimizations to SSE4.1 for function NeuralConvolutionForward.
  • Replace SSE4.2 optimizations to SSE4.1 for function Crc32c.
  • Replace SSSE3 optimizations to SSE4.1 for function AlphaBlending.
  • Replace SSSE3 optimizations to SSE4.1 for function AlphaFilling.
  • Replace SSSE3 optimizations to SSE4.1 for function AlphaPremultiply.
  • Replace SSSE3 optimizations to SSE4.1 for function BayerToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToBayer.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToRgba.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv420p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuv422p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgraToYuva420p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToBayer.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToBgra.
  • Replace SSSE3 optimizations to SSE4.1 for function RgbToBgra.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToGray.
  • Replace SSSE3 optimizations to SSE4.1 for function RgbToGray.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function TransformImage.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv420p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv422p.
  • Replace SSSE3 optimizations to SSE4.1 for function BgrToYuv444p.
  • Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function DeinterleaveBgra.
  • Replace SSSE3 optimizations to SSE4.1 for function GaussianBlur3x3.
  • Replace SSSE3 optimizations to SSE4.1 for function GrayToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function InterleaveBgra.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToBgr.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv420pToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv422pToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function Yuv444pToRgb.
  • Replace SSSE3 optimizations to SSE4.1 for function Laplace.
  • Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbs.
  • Replace SSSE3 optimizations to SSE4.1 for function LaplaceAbsSum.
  • Replace SSSE3 optimizations to SSE4.1 for function MeanFilter3x3.
  • Replace SSSE3 optimizations to SSE4.1 for function ReduceColor2x2.
  • Replace SSSE3 optimizations to SSE4.1 for function ReduceGray2x2.
  • Replace SSSE3 optimizations to SSE4.1 for function ReduceGray4x4.
  • Replace SSSE3 optimizations to SSE4.1 for function Reorder16bit.
  • Replace SSSE3 optimizations to SSE4.1 for function Reorder32bit.
  • Replace SSSE3 optimizations to SSE4.1 for function Reorder64bit.
  • Replace SSSE3 optimizations to SSE4.1 for function ResizeBilinear.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDx.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbs.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDxAbsSum.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDy.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbs.
  • Replace SSSE3 optimizations to SSE4.1 for function SobelDyAbsSum.
  • Replace SSSE3 optimizations to SSE4.1 for function ContourMetrics.
  • Replace SSSE3 optimizations to SSE4.1 for function ContourMetricsMasked.
  • Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSum.
  • Replace SSSE3 optimizations to SSE4.1 for function SquaredDifferenceSumMasked.
  • Replace SSSE3 optimizations to SSE4.1 for function TextureBoostedSaturatedGradient.
  • Replace SSSE3 optimizations to SSE4.1 for class ResizerByteBilinear.

Tests

New features
  • Colorized annotation in console logging.
Improving
  • Performance report generation to text file.
  • Thread ID annotation in console logging.

Infrastructure

New features
  • SIMD_INT8_DEBUG cmake option.
Removing
  • Separate support of SSE3 extension (it has been moved into SSE4.1).
  • Separate support of SSE4.2 extension (it has been moved into SSE4.1).
  • Separate support of SSSE3 extension (it has been moved into SSE4.1).
Home

July 1, 2021 (version 4.8.103)

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of class ResizerShortBilinear.
  • Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNa16f.
  • Base implementation, AVX2, AVX-512BW and NEON optimizations of function VectorNormNp16f.
  • Parameter of ROI mask in Motion::Model.
  • SSE2, AVX-512BW and NEON optimizations of function AbsDifference.
  • NEON optimizations of function AlphaUnpremultiply.
  • NEON optimizations of function AlphaPremultiply.
  • NEON optimizations of function ValueSquareSums.
Improving
  • Performance of SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fGemm class.
Bug fixing
  • Linker warning in file SimdImageLoad.h (MSVS).
Replacing
  • Replace SSE optimizations to SSE2 for function SvmSumLinear.
  • Replace SSE optimizations to SSE2 for function Fill32f.
  • Replace SSE optimizations to SSE2 for function CosineDistance32f.
  • Replace SSE optimizations to SSE2 for function DifferenceSum32f.
  • Replace SSE optimizations to SSE2 for function SquaredDifferenceKahanSum32f.
  • Replace SSE optimizations to SSE2 for function HogDeinterleave.
  • Replace SSE optimizations to SSE2 for function HogFilterSeparable.
  • Replace SSE optimizations to SSE2 for class ResizerFloatBilinear.
  • Replace SSE optimizations to SSE2 for function NeuralAddVectorMultipliedByValue.
  • Replace SSE optimizations to SSE2 for function NeuralAddVector.
  • Replace SSE optimizations to SSE2 for function NeuralAddVector.
  • Replace SSE optimizations to SSE2 for function NeuralAdaptiveGradientUpdate.
  • Replace SSE optimizations to SSE2 for function NeuralDerivativeRelu.
  • Replace SSE optimizations to SSE2 for function NeuralDerivativeSigmoid.
  • Replace SSE optimizations to SSE2 for function NeuralDerivativeTanh.
  • Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid.
  • Replace SSE optimizations to SSE2 for function NeuralRoughSigmoid2.
  • Replace SSE optimizations to SSE2 for function NeuralRoughTanh.
  • Replace SSE optimizations to SSE2 for function NeuralUpdateWeights.
  • Replace SSE optimizations to SSE2 for function NeuralPooling1x1Max3x3.
  • Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max2x2.
  • Replace SSE optimizations to SSE2 for function NeuralPooling2x2Max3x3.
  • Replace SSE optimizations to SSE2 for function SynetPoolingForwardAverage.
  • Replace SSE optimizations to SSE2 for function SynetPoolingForwardMax32f.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Forward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Forward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Forward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Forward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Backward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Backward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Backward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Backward.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution2x2Sum.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution3x3Sum.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution4x4Sum.
  • Replace SSE optimizations to SSE2 for function NeuralAddConvolution5x5Sum.
  • Replace SSE optimizations to SSE2 for function Gemm32fNN.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward0.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward1.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward2.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward3.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward4.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward8.
  • Replace SSE optimizations to SSE2 for function SynetFusedLayerForward9.
  • Replace SSE optimizations to SSE2 for function SynetReorderImage.
  • Replace SSE optimizations to SSE2 for function SynetReorderFilter.
  • Replace SSE optimizations to SSE2 for function SynetAddBias.
  • Replace SSE optimizations to SSE2 for function SynetEltwiseLayerForward.
  • Replace SSE optimizations to SSE2 for function SynetInnerProductLayerForward.
  • Replace SSE optimizations to SSE2 for function SynetShuffleLayerForward.
  • Replace SSE optimizations to SSE2 for function SynetHswish32f.
  • Replace SSE optimizations to SSE2 for function SynetPreluLayerForward.
  • Replace SSE optimizations to SSE2 for function SynetRelu32f.
  • Replace SSE optimizations to SSE2 for function SynetRestrictRange32f.
  • Replace SSE optimizations to SSE2 for function SynetScaleLayerForward.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x3Block1x4SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel1x5Block1x4SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block2x2SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel2x2Block4x4SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block2x2SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block3x3SetOutput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetFilter.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetInput.
  • Replace SSE optimizations to SSE2 for function WinogradKernel3x3Block4x4SetOutput.

Tests

New features
  • Tests to verify functionality function of VectorNormNa16f.
  • Tests to verify functionality function of VectorNormNp16f.

Infrastructure

Removing
  • Separate support of SSE extension (it has been moved into SSE2).
Home

June 2, 2021 (version 4.7.102)

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of function ValueSquareSums.
Improving
  • Performance of AVX2, AVX-512F and NEON optimizations of SynetConvolution32fGemmNN class.
  • Performance of Neural::FullyConnectedLayer::Forward method.
Bug fixing
  • Error in class SynetMergedConvolution32fDc (large weights case).
  • Compiler error in file SimdAvx2SynetConversion.cpp (MSVS-2015, Win32).
  • Error in SSSE3 optimization of ImageTransform function.
  • Compiler error in file SimdImageSaveJpeg.h (Clang, Mac mini).
  • Compiler warnings (Clang).
  • Error in function ImagePngLoader::ReadTransparency (test tbbn0g04.png).
  • Error in Base implementation, SSE4.1 optimization of class ImagePngLoader (test basn0g16.png).
  • Error in SSE4.1 optimization of class ImagePngLoader (test s02i3p01.png).

Tests

New features
  • Tests to verify functionality function of ValueSquareSums.
Improving
  • Header of performance report table.
Bug fixing
  • Compiler error in file TestFile.h (Clang, Mac mini).
Home

May 3, 2021 (version 4.7.101)

Algorithms

New features
  • Parameter a in function DeinterleaveBgra can be NULL.
  • Simd::DeinterleaveBgra C++ wrapper.
  • Simd::DeinterleaveRgb C++ wrapper.
  • Simd::DeinterleaveRgba C++ wrappers.
  • Method View::Load (from memory).
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of ImageJpegSaver class.
  • Base implementation of ImageJpegLoader class.
  • Base implementation of ImagePngLoader class.
  • NEON optimizations of ImagePngSaver class.
  • SIMD_SYNET_DISABLE macro.
  • Base implementation, AVX2, AVX-512BW, NEON optimizations of function СosineDistancesMxNp16f.
Bug fixing
  • Error in NEON optimizations of function СosineDistancesMxNa16f.

Tests

New features
  • Parameter '-ri' to set real image name in runtime.
  • Tests to verify functionality function of СosineDistancesMxNp16f.
  • Special tests for verifying functionality of function ImageLoadFromMemory.
Bug fixing
  • Error in saving of output log.

Infrastructure

New features
  • Real images to test encoding/decoding algorithms.
  • SIMD_SYNET cmake option.
  • SIMD_HIDE cmake option.
Removing
  • Project files of Microsoft Visual Studio 2017 (for Android).

Documentation

New features
  • Description of Cmake parameters.
Home

April 1, 2021 (version 4.6.100)

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW optimizations of ImagePngSaver class.
  • SynetInnerProduct32f framework.
  • Base implementation, SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fGemm class.
  • Base implementation, SSE4.1, AVX, AVX2, AVX-512F optimizations of SynetInnerProduct32fProd class.
  • Rgba32 format in View structure.
  • Pixel::Rgba32 structure.
  • Simd::RgbToBgr C++ wrapper.
  • Simd::GrayToRgb C++ wrapper.
  • Simd::GrayToRgba C++ wrapper.
  • Simd::BgrToRgba C++ wrapper.
  • Simd::RgbaToRgb C++ wrapper.
  • Base implementation, SSE2, AVX2, AVX-512BW, NEON optimizations of function RgbaToGray.
  • Base implementation, SSSE3, AVX2, AVX-512BW, NEON optimizations of function BgraToRgba.
  • Simd::RgbToRgba C++ wrapper.
  • Simd::RgbaToBgra C++ wrapper.
  • Rgba32 format in Convert function.
  • Rgba32 format in function ImageSave.
Improving
  • Reduce memory allocations in Simd::ContourDetector.
Bug fixing
  • Assert in function Avx::SynetMergedConvolution32fCdc::SynetMergedConvolution32fCdc.
  • Assert in function Avx::SynetMergedConvolution32fCd::SynetMergedConvolution32fCd.
  • Assert in function Avx::SynetMergedConvolution32fDc::SynetMergedConvolution32fDc.
  • Freezes in function SynetConvolution32fNhwcDirect::OldReorderWeight (ARMv7 architecture).
  • Freezes in file SimdGemm.h (ARMv7 architecture).

Tests

New features
  • Tests for verifying functionality of SynetInnerProduct32f framework.
  • Performance report use milliseconds or microseconds (choosing in runtime).
  • Special test to verify functionality function of Simd::Convert.
  • Tests to verify functionality function of RgbaToGray.
  • Tests to verify functionality function of BgraToRgba.
Bug fixing
  • Crash in test BgrToRgbAutoTest.
  • Error in test of SynetMergedConvolution8i.

Infrastructure

Removing
  • Remove project files of Microsoft Visual Studio 2013.
Home

March 1, 2021 (version 4.6.99)

Algorithms

New features
  • SimdImageFileType enumeration.
  • ImageSaveToFile function.
  • ImageSaveToMemory function.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmTxtSaver class.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmBinSaver class.
  • Change order of parameters in function BgrToRgb.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmBinSaver class.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmTxtSaver class.
  • Additional parameters in function View::Save.
  • Method View::Release.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmTxtLoader class.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePgmBinLoader class.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmTxtLoader class.
  • Base implementation, SSE4.1, AVX2, AVX-512BW, NEON optimizations of ImagePpmBinLoader class.
  • Additional parameter in function View::Load.
  • Base implementation of Crc32 function.
Bug fixing
  • Crash in Simd::Detection on Python (using of std::unique_ptr).

Tests

New features
  • Possibility to write output video in UseFaceDetection.cpp example.
  • Test parameter '-o=' to write annotated output video.
  • Tests for verifying functionality of function ImageSaveToFile.
  • Tests for verifying functionality of function ImageSaveToMemory.
  • Tests for verifying functionality of function ImageLoadFromMemory.
  • Tests for verifying functionality of function Crc32.

Documentation

New features
  • Example of use into description of Font.
Bug fixing
  • Errors in Simd Library description.
Home

February 1, 2021 (version 4.6.98)

Algorithms

New features
  • Add parameter epsilon to GaussianBlur engine.
  • Add function SynetConvolution32fInfo.
  • Add function SynetConvolution8iInfo.
  • Add function SynetDeconvolution32fInfo.
  • Add function SynetMergedConvolution32fInfo.
  • Add function SynetMergedConvolution8iInfo.
Improving
  • Performance of SynetConvolution8iNhwcDirect class (case of horizontal padding of small image).
Renaming
  • GaussianBlur engine parameter from radius to sigma.
Bug fixing
  • Error in GaussianBlur engine (case of small images).
  • Performance degradation of AVX-512VNNI optimization of SynetConvolution8i framework.
  • Performance degradation of AVX-512VNNI optimization of SynetMergedConvolution8i framework.
  • Error in GaussianBlur engine (wrong processing of last rows).
  • Error in trajectory averaging algorithm in Motion::Detector.

Tests

New features
  • Possibility to write output video in UseMotionDetector.cpp example.
Bug fixing
  • Error in files: TestVideo.cpp, UseMotionDetector.cpp, UseFaceDetector.cpp (MSVS-2019, OpenCV enabled).

Documentation

Improving
  • Description of GaussianBlur engine.
  • Description of Motion::Detector.

Infrastructure

New feature
  • Ocv.prop.default for Visual Studio 2019.
Renaming
  • Cmake parameter from LIBRARY to SIMD_SHARED.
  • Cmake parameter from CHECK_VERSION to SIMD_GET_VERSION.
  • Cmake parameter from TOOLCHAIN to SIMD_TOOLCHAIN.
  • Cmake parameter from TARGET to SIMD_TARGET.
Home

January 4, 2021 (version 4.6.97)

Algorithms

New features
  • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetMish32f.
  • Support of Mish activation function in SynetConvolution32f framework.
  • Support of Mish activation function in SynetMergedConvolution32f framework.
  • Support of Mish activation function in SynetConvolution8i framework.
  • Support of Mish activation function in SynetMergedConvolution8i framework.
  • Support of Mish activation function in SynetDeconvolution32f framework.
  • Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of GaussianBlur engine.
Improving
  • AVX-512F optimization of SynetConvolution32fNhwcDirect class.
  • AVX-512F optimization of SynetConvolution32fGemmNN class.
  • AVX-512F optimization of SynetConvolution32fWinograd class.
  • AVX-512F optimization of function Gemm32fNN.
Bug fixing
  • Error in Base implementation of SynetMergedConvolution32f (type=CDC, add=1).
  • Error in function SimdAlignment.
  • Visual Studio 2017 compiler error in files SimdAvx512bwSynet.cpp, SimdAvx512bwSynetScale.cpp, SimdAvx512bwAlphaBlending.cpp.

Test framework

New features
  • Tests for verifying functionality of function SynetMish32f.
  • Tests for verifying functionality of GaussianBlur engine.
Home
2025 | 2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013