Simd Library Release Notes (2019).

Home | Release Notes | Download | Documentation | Issues | GitHub

2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013

December 2, 2019 (version 4.4.84)

Algorithms

New features
  • Method View::Clear.
  • Parameter makeCopy in method ShiftDetector::SetBackground.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function SynetPoolingForwardAverage.
Improving
  • SSE, AVX, AVX2, AVX-512F and NEON optimizations of Convolution32f framework.
Bug fixing
  • Crash when defined SIMD_PERFORMANCE_STATISTIC.
  • Compiler warning in SSSE3 and AVX2 optimizations of Resizer.
  • Error in base implementation of function SquaredDifferenceKahanSum32f (Visual Studio 2019).

Test framework

New features
  • Tests for verifying functionality of function SynetPoolingForwardAverage.
Home

November 1, 2019 (version 4.4.83)

Algorithms

New features
  • Base implementation, SSE4.1, AVX2, AVX-512BW and NEON optimizations of function SynetSetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function SynetHswish32f.
  • Support of Hswish activation function in Convolution32f framework.
  • Support of Hswish activation function in MergedConvolution32f framework.
  • Support of Hswish activation function in Deconvolution32f framework.
  • Support of 5x5 and 7x7 depthwise convolution in the middle layer of MergedConvolution32f framework.
  • Base implementation, SSE, AVX, AVX-512BW and NEON optimizations of function SynetShuffleLayerForward.
  • Base implementation, SSE2, AVX2, AVX-512BW and NEON optimizations of function GetObjectMoments.
Improving
  • SSE2, AVX2, AVX-512BW and NEON optimizations of function GetObjectMoments.
  • NEON optimization of function Gemm32fNN.
  • NEON optimization of function Gemm32fNT.
  • NEON optimization of Convolution32f framework.
  • NEON optimization of MergedConvolution32f framework.
  • NEON optimization of Deconvolution32f framework.
Renaming
  • Function from SynetRestrictRange to SynetRestrictRange32f.
Bug fixing
  • GCC-4.9 compiler error in function Base::CpuCacheSize.
  • Error in SSE2 optimization of Resizer framework.

Test framework

New features
  • Tests for verifying functionality of function SynetSetInput.
  • Tests for verifying functionality of function SynetHswish32f.
  • Tests for verifying functionality of function SynetShuffleLayerForward.
  • Tests for verifying functionality of function GetObjectMoments.

Infrastructure

Bug fixing
  • Missing of file Prop.props for Microsoft Visual Studio 2019.
Home

October 1, 2019 (version 4.4.82)

Algorithms

New features
  • View::Clone method (it creates clone on the base of external buffer).
  • Function Simd::PrintInfo.
  • SynetDeconvolution32f Framework.
  • Base implementation, SSE2, AVX, AVX2, AVX-512F and NEON optimizations of SynetDeconvolution32fGemmNN class.
  • Base implementation, SSE2, AVX, AVX2, AVX-512F and NEON optimizations of SynetDeconvolution32fNhwcDirect2x2 class.
Improving
  • Now CpuInfo gets L1D, L2, L3 cache sizes, numbers of sockets, cpus and threads.
Renaming
  • Function from ConvolutionInit to SynetConvolution32fInit.
  • Function from ConvolutionExternalBufferSize to SynetConvolution32fExternalBufferSize.
  • Function from ConvolutionInternalBufferSize to SynetConvolution32fInternalBufferSize.
  • Function from ConvolutionSetParams to SynetConvolution32fSetParams.
  • Function from ConvolutionForward to SynetConvolution32fForward.
  • Function from MergedConvolutionInit to SynetMergedConvolution32fInit.
  • Function from MergedConvolutionExternalBufferSize to SynetMergedConvolution32fExternalBufferSize.
  • Function from MergedConvolutionInternalBufferSize to SynetMergedConvolution32fInternalBufferSize.
  • Function from MergedConvolutionSetParams to SynetMergedConvolution32fSetParams.
  • Function from MergedConvolutionForward to SynetMergedConvolution32fForward.
Bug fixing
  • Error in Resizer framework (in file SimdBaseResizer.cpp).

Test framework

New features
  • Tests for verifying functionality of SynetDeconvolution32f Framework.

Infrastructure

New features
  • Project files for Microsoft Visual Studio 2019.
Bug fixing
  • Some Microsoft Visual Studio project properties can cause program crash at old CPUs.
  • Using of AVX512 property instead of SIMD_AVX512 in CMakeLists.txt.
Home

September 2, 2019 (version 4.3.81)

Algorithms

New features
  • SimdTensorFormatNchwXc and SimdTensorFormatOyxiXo types in SimdTensorFormatType enumeration.
  • Function SynetSpecifyTensorFormat.
  • Function SynetTensorAlignment.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetAddBias.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetScaleLayerForward.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetFusedLayerForward0.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetFusedLayerForward1.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetFusedLayerForward2.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetFusedLayerForward3.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetFusedLayerForward4.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetFusedLayerForward8.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetFusedLayerForward9.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetLrnLayerCrossChannels.
  • Support of NCHW4c, NCHW8c, NCHW16c formats in function SynetPreluLayerForward.
  • Support of P2(pgm) and P3(ppm) image formats in View::Load.
  • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetElu32f.
  • Support of Elu activation function in Convolution framework.
  • Support of Elu activation function in MergedConvolution framework.
  • New meaning of add parameter in MergedConvolution framework.
Improving
  • Performance measurement in Convolution and MergedConvolution frameworks.
Bug fixing
  • Error in function Convert (in file SimdFrame.hpp).
  • Error in function MergedConvolutionForward.

Test framework

New features
  • Tests for verifying functionality of function SynetAddBias for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetScaleLayerForward for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetFusedLayerForward0 for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetFusedLayerForward1 for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetFusedLayerForward2 for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetFusedLayerForward3 for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetFusedLayerForward4 for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetFusedLayerForward8 for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetFusedLayerForward9 for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetLrnLayerCrossChannels for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Tests for verifying functionality of function SynetPreluLayerForward for NCHW4c, NCHW8c, NCHW16c tensor formats.
  • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetElu32f.

Infrastructure

Renaming
  • Parameter from AVX512 to SIMD_AVX512 in CMakeLists.txt.
  • Parameter from PRINT_INFO to SIMD_INFO in CMakeLists.txt.
Home

August 1, 2019 (version 4.3.80)

Algorithms

New features
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function SynetFusedLayerForward8.
  • Partial batch merging in Convolution algorithm (Winograd and GemmNN methods).
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function Winograd3x3SetFilter.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function Winograd3x3SetInput.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function Winograd3x3SetOutput.
  • Winograd3x3 method in Convolution algorithm.
  • Runtime choice of best micro kernel in Convolution Framework (GemmNN and Winograd methods).
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function SynetFusedLayerForward9.
  • SimdTensorFormatType enumeration.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function SynetConvertImage.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function SynetConvertFilter.
Improving
  • Performance profiling.
  • SSE, AVX, AVX2, AVX-512F and NEON optimizations of MergedConvolution framework.
  • SSE, AVX, AVX2, AVX-512F and NEON optimizations of Convolution Framework (GemmNN and Winograd methods).
Bug fixing
  • Error in Convolution Framework (GemmNN method).
  • Low performance of NEON optimization in Convolution Framework (GemmNN and Winograd methods).
  • Crash in base implementation of in functions FillPixel, FillBgra, FillUv (GCC, -O3).

Test framework

New features
  • Tests for verifying functionality of function SynetFusedLayerForward8.
  • Tests for verifying functionality of function Winograd3x3SetFilter.
  • Tests for verifying functionality of function Winograd3x3SetInput.
  • Tests for verifying functionality of function Winograd3x3SetOutput.
  • Special complex tests for verifying functionality of functions Winograd2x3SetFilter, Winograd2x3SetInput and Winograd2x3SetOutput.
  • Special complex tests for verifying functionality of functions Winograd3x3SetFilter, Winograd3x3SetInput and Winograd3x3SetOutput.
  • Special complex tests for verifying functionality of functions Winograd4x3SetFilter, Winograd4x3SetInput and Winograd4x3SetOutput.
  • Tests for verifying functionality of function SynetFusedLayerForward9.
  • Tests for verifying functionality of function SynetConvertImage.
  • Tests for verifying functionality of function SynetConvertFilter.

Infrastructure

New features
  • SIMD_PERF parameter in CMakeLists.txt.
Bug fixing
  • Visual Studio project build error (in file GetVersion.cmd).
Home

July 2, 2019 (version 4.3.79)

Algorithms

New features
  • Additional macros for performance profiling.
  • Add function SimdPerformanceStatistic.
  • Base implementation, SSE, AVX, AVX2 AVX-512F and NEON optimizations of Convolution framework (NhwcDirect mode).
Improving
  • SSE, AVX, AVX2, AVX-512F and NEON optimizations of MergedConvolution framework.
Bug fixing
  • Error in function MergedConvolution::SetSize (Merged Convolution Framework).
Home

June 3, 2019 (version 4.3.78)

Algorithms

New features
  • SimdConvolutionParameters structure.
  • Base implementation, SSE, AVX, AVX2, AVX-512F and NEON optimizations of MergedConvolution framework (version 2).
  • Base implementation, AVX2 optimizations of function AbsDifference.
  • SSSE3 and NEON optimizations of function TransformImage (TransformTransposeRotate0 transformation).
Bug fixing
  • Error in Convolution framework (group != 1, NHWC mode).

Test framework

New features
  • Tests for verifying functionality of function AbsDifference.
Bug fixing
  • Compiler error in file TestResize.cpp (aarch64 toolchain).
Home

May 2, 2019 (version 4.3.77)

Algorithms

New features
  • Base implementation, SSE2, AVX2, AVX-512F and NEON optimizations of function SynetLrnLayerCrossChannels(NHWC mode).
  • Base implementation, SSSE3, AVX2, AVX-512BW and NEON optimizations of function BgrToRgb.
  • Pixel::Rgb24 structure.
  • Base implementation of Resizer framework (area method, byte type).
  • SSE2, SSSE3, AVX2, AVX-512BW and NEON optimizations of Resizer framework (bilinear method, byte type).
  • SSE2, SSE4.1, AVX2, AVX-512BW and NEON optimizations of Resizer framework (area method, byte type).
  • Simd::Resize function.
  • Base implementation, SSE, AVX, AVX2 and AVX-512F optimizations of MergedConvolution framework.
Improving
  • AVX-512F optimization of Convolution framework.
Bug fixing
  • Error in SSE, AVX, AVX-512F and NEON optimizations of function Fill32f.
  • Out of range in SSE4.1, AVX2, AVX-512BW and NEON optimizations of functions DetectionHaarDetect32fp and DetectionHaarDetect32fi.
  • Out of range in SSE4.1, AVX2, AVX-512BW and NEON optimizations of functions DetectionLbpDetect32fp, DetectionLbpDetect32fi, DetectionLbpDetect16ip and DetectionLbpDetect16ii.
  • Error in AVX2, AVX-512BW and NEON optimizations of function CosineDistancesMxNa16f.
  • Error in AVX-512F optimization of function Convolution framework.
  • Error in SSE, AVX, AVX2, AVX-512F and NEON optimizations of Convolution framework (NHWC mode, depthwise convolution).
  • Error in AVX-512F optimization of Convolution framework (NHWC mode, winograd2x3 method).
  • Error in AVX-512F optimization of Convolution framework (function KernelHwcDefaultBody8).

Test framework

New features
  • Tests for verifying functionality of function SynetLrnLayerCrossChannels (NHWC mode).
  • Tests for verifying functionality of function BgrToRgb.
  • Tests for verifying functionality of MergedConvolution framework.

Infrastructure

Bug fixing
  • Compiler warning for GCC >= 7.0 (ARM target).
Home

April 1, 2019 (version 4.3.76)

Algorithms

New features
  • Base implementation, AVX2, AVX-512BW and NEON optimizations of function CosineDistancesMxNa16f.
  • Macro SIMD_FUTURE_DISABLE.
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function Winograd4x3SetInput(NHWC mode).
  • Base implementation, SSE, AVX, AVX-512F and NEON optimizations of function Winograd4x3SetOutput(NHWC mode).
  • Support of Winograd4x3 for Convolution framework (NHWC mode).
  • Parameter 'batch' in Convolution framework.
  • Function ConvolutionInternalBufferSize.
  • Use parameter trans instead of parameters srcT and dstT in function ConvolutionInit.
Improving
  • ConvolutionGemmNN method of Convolution framework (NHWC mode).
  • ConvolutionWinograd method of Convolution framework (NHWC mode).
Renaming
  • Function from GetFlushToZero to GetFastMode.
  • Function from SetFlushToZero to SetFastMode.
  • Function from ConvolutionBufferSize to ConvolutionExternalBufferSize.
Bug fixing
  • Compiler error (using of name 'small' which can be system macro) in file SimdSse2Statistic.cpp.
  • Compiler warning (unused variable) in function Neon::SetFlushToZero.
  • Compiler warning (unused variable) in function Base::ConvolutionBiasAndActivation.
  • Compiler error (Visual Studio for Android) in file SimdSsse3Transform.cpp.
  • Compiler error (Visual Studio for Android) in function SimdCosineDistance16f.
  • Low performance of function SimdSquaredDifferenceSum16f.
  • Compiler warning (unused variable) in function Neon::AlphaFilling.
  • Compiler warning (unused variable) in function Neon::Fill32f.
  • Compiler warning (wrong initialization order) in file SimdNeonGemm32f.cpp.
  • Compiler warning (unused variable) in function Neon::SynetInnerProductLayerForward.
  • Compiler warning (unused variable) in file TestConvolution.cpp.
  • Compiler internal error (G++ 6.3.0) in function Neon::BgrToBgra.
  • Compiler error (aarch64) in functions Neon::GetFlushToZero and Neon::SetFlushToZero.
  • Error in NEON optimization of function HogLiteFindMax7x7.
  • Denormals performance bug.
  • Error in NEON optimization of function ReduceGray2x2.

Test framework

New features
  • Tests for verifying functionality of function CosineDistancesMxNa16f.
  • Tests for verifying functionality of function Winograd4x3SetInput (NHWC mode).
  • Tests for verifying functionality of function Winograd4x3SetOutput (NHWC mode).
Bug fixing
  • Compiler error (Visual Studio for Android) in file TestFloat16.cpp.
  • Compiler warning (wrong initialization order) in file SimdNeonGemm32f.cpp.
  • Compiler internal error (G++ 4.9).
Home

March 7, 2019 (version 4.3.75)

Algorithms

New features
  • Base implementation, SSE2, SSSE3, AVX2 and AVX-512BW optimizations of function BgraToYuva420p.
  • NEON optimization of function NeuralSigmoid.
  • NEON optimization of function NeuralTanh.
  • NEON optimization of function NeuralPow.
  • NEON version of functions GetFlushToZero and SetFlushToZero.
  • NEON optimization of function Fill32f.
  • NEON optimization of function AlphaFilling.
  • NEON optimization of function CosineDistance16f.
  • NEON optimization of function CosineDistance32f.
  • NEON optimization of function Gemm32fNN.
  • NEON optimization of function Gemm32fNT.
  • NEON optimization of function FillPixel.
  • NEON optimization of function ReduceColor2x2.
  • NEON optimization of function BayerToBgra.
  • NEON optimization of function BayerToBgr.
  • NEON optimization of function TransformImage.
  • NEON optimization of function BgraToYuva420p.
  • NEON optimization of function Yuva420pToBgra.
  • NEON optimization of function Resizer.
  • NEON optimization of function HogLiteFindMax7x7.
  • NEON optimization of function HogLiteCreateMask.
  • NEON optimization of function HogLiteFilterSeparable.
  • NEON optimization of function HogLiteCompressFeatures.
  • NEON optimization of function HogLiteResizeFeatures.
  • NEON optimization of function HogLiteFilterFeatures.
  • NEON optimization of function HogLiteExtractFeatures.
  • NEON optimization of function Winograd2x3SetFilter.
  • NEON optimization of function Winograd4x3SetFilter.
  • NEON optimization of function Winograd2x3SetInput.
  • NEON optimization of function Winograd2x3SetOutput.
  • NEON optimization of function SynetAddBias.
  • NEON optimization of function SynetEltwiseLayerForward.
  • NEON optimization of function SynetPoolingForwardMax.
  • NEON optimization of function SynetFusedLayerForward0.
  • NEON optimization of function SynetFusedLayerForward1.
  • NEON optimization of function SynetFusedLayerForward2.
  • NEON optimization of function SynetFusedLayerForward3.
  • NEON optimization of function SynetFusedLayerForward4.
  • NEON optimization of function SynetInnerProductLayerForward.
  • NEON optimization of function SynetLrnLayerCrossChannels.
  • NEON optimization of function SynetPreluLayerForward.
  • NEON optimization of function SynetRestrictRange.
  • NEON optimization of function SynetScaleLayerForward.
  • NEON optimization of function SynetSoftmaxLayerForward.
  • NEON optimization of function ConvolutionForward.
Improving
  • AVX, AVX2 and AVX-512F optimizations of function ConvolutionForward.
  • SSE, AVX, AVX2 and AVX-512F optimizations of function Resizer.
Bug fixing
  • Error in AVX-512BW optimization of function ChangeColors.
  • Error in AVX-512BW optimization of function NormalizeHistogram.
  • Error in AVX-512F optimization of function NeuralConvolutionForward.
  • Error in NEON optimization of function Uint8ToFloat32.
  • Error in NEON optimization of function SquaredDifferenceSum16f.
  • Error in SSE version of functions GetFlushToZero.
  • Error in Base implementation of function SynetFusedLayerForward0.

Test framework

New features
  • Tests for verifying functionality of function BgraToYuva420p.
  • Tests for verifying NEON optimization of of function NeuralSigmoid.
  • Tests for verifying NEON optimization of of function NeuralTanh.
  • Tests for verifying NEON optimization of of function NeuralPow.
  • Tests for verifying NEON optimization of of function Fill32f.
  • Tests for verifying NEON optimization of of function AlphaFilling.
  • Tests for verifying NEON optimization of of function CosineDistance16f.
  • Tests for verifying NEON optimization of of function CosineDistance32f.
  • Tests for verifying NEON optimization of of function Gemm32fNN.
  • Tests for verifying NEON optimization of of function Gemm32fNT.
  • Tests for verifying NEON optimization of of function FillPixel.
  • Tests for verifying NEON optimization of of function ReduceColor2x2.
  • Tests for verifying NEON optimization of of function BayerToBgra.
  • Tests for verifying NEON optimization of of function BayerToBgr.
  • Tests for verifying NEON optimization of of function TransformImage.
  • Tests for verifying NEON optimization of of function BgraToYuva420p.
  • Tests for verifying NEON optimization of of function Yuva420pToBgra.
  • Tests for verifying NEON optimization of of function Resizer.
  • Tests for verifying NEON optimization of of function HogLiteFindMax7x7.
  • Tests for verifying NEON optimization of of function HogLiteCreateMask.
  • Tests for verifying NEON optimization of of function HogLiteFilterSeparable.
  • Tests for verifying NEON optimization of of function HogLiteCompressFeatures.
  • Tests for verifying NEON optimization of of function HogLiteResizeFeatures.
  • Tests for verifying NEON optimization of of function HogLiteFilterFeatures.
  • Tests for verifying NEON optimization of of function HogLiteExtractFeatures.
  • Tests for verifying NEON optimization of of function Winograd2x3SetFilter.
  • Tests for verifying NEON optimization of of function Winograd4x3SetFilter.
  • Tests for verifying NEON optimization of of function Winograd2x3SetInput.
  • Tests for verifying NEON optimization of of function Winograd2x3SetOutput.
  • Tests for verifying NEON optimization of of function SynetAddBias.
  • Tests for verifying NEON optimization of of function SynetEltwiseLayerForward.
  • Tests for verifying NEON optimization of of function SynetPoolingForwardMax.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward0.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward1.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward2.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward3.
  • Tests for verifying NEON optimization of of function SynetFusedLayerForward4.
  • Tests for verifying NEON optimization of of function SynetInnerProductLayerForward.
  • Tests for verifying NEON optimization of of function SynetLrnLayerCrossChannels.
  • Tests for verifying NEON optimization of of function SynetPreluLayerForward.
  • Tests for verifying NEON optimization of of function SynetRestrictRange.
  • Tests for verifying NEON optimization of of function SynetScaleLayerForward.
  • Tests for verifying NEON optimization of of function SynetSoftmaxLayerForward.
  • Tests for verifying NEON optimization of of function ConvolutionForward.
Bug fixing
  • Error (at 32-bit OS) in test of function HogLiteFindMax7x7.
Home

February 1, 2019 (version 4.2.74)

Algorithms

New features
  • Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd2x3SetFilter(NHWC mode).
  • Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd4x3SetFilter(NHWC mode).
  • Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd2x3SetInput(NHWC mode).
  • Base implementation, SSE, AVX and AVX-512F optimizations of function Winograd2x3SetOutput(NHWC mode).
  • Parameter gemm (a pointer to external function of matrix multiplication) in function ConvolutionInit.
  • Choise of the best gemm function in runtime.
  • SIMD_RUNTIME_GEMM_STATISTIC macro (annotation of runtime choise of gemm).
  • Base implementation, SSE, AVX, AVX2 and AVX-512F optimizations of function SynetPoolingForwardMax.
  • Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward4
  • Base implementation, SSE2, AVX2 and AVX-512F optimizations of function SynetSoftmaxForward.
  • Base implementation, SSE2, AVX2 and AVX-512BW optimizations of function Yuva420pToBgra.
  • Base implementation, SSSE3 optimization of function TransformImage.
Improving
  • SSE, AVX, AVX2 and AVX-512F optimizations of function ConvolutionForward.
Removing
  • Function Winograd2x3iSetInput.
  • Function Winograd2x3iSetOutput.
Bug fixing
  • Error in AVX-512F optimization of function ConvolutionDirectHwcConvolutionBiasActivationDefault.

Test framework

New features
  • Tests for verifying functionality of function Winograd2x3SetFilter (NHWC mode).
  • Tests for verifying functionality of function Winograd4x3SetFilter (NHWC mode).
  • Tests for verifying functionality of function Winograd2x3SetInput (NHWC mode).
  • Tests for verifying functionality of function Winograd2x3SetOutput (NHWC mode).
  • Printing of internal performance statistic.
  • Tests for verifying functionality of function SynetPoolingForwardMax.
  • Tests for verifying functionality of function FusedLayerForward4.
  • Tests for verifying functionality of function SynetSoftmaxForward.
  • Tests for verifying functionality of function Yuva420pToBgra.
  • Tests for verifying functionality of function TransformImage.

Infrastructure

Bug fixing
  • The input variable CMAKE_CXX_FLAGS can contain invalid options (-mtune=native, -march=haswell, -mavx, etc.).
Home

January 2, 2019 (version 4.2.73)

Algorithms

New features
  • Base implementation, SSE, AVX and AVX-512F optimizations of function FusedLayerForward3.
  • Base implementation, SSE, AVX and AVX-512F optimizations of function ConvolutionBiasAndActivation(NHWC mode).
Improving
  • SSE, AVX, AVX2 and AVX-512F optimizations of function Gemm32fNN.
  • Add output parameter 'internal' to function ConvolutionSetWeight.
Bug fixing
  • Wrong assert condition in AVX-512F optimization of function NeuralRelu.
  • Visual Studio 2017 compiler error (intrinsic _mm512_maskz_loadu_epi8 in Release mode).
  • Crash: reading of unaligned memory in AVX-512BW optimization of function HogLiteFilterFeatures.
  • Performance bug in functions SynetAddBias, SynetFusedLayerForwardX, SynetPreluLayerForward and SynetScaleLayerForward when (count = 1, trans = 1).

Test framework

New features
  • Tests for verifying functionality of function FusedLayerForward3.
Home
2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013