Simd Library Documentation.

Home | Release Notes | Download | Documentation | Issues | GitHub
Brain Floating Point (16-bit) Numbers

Functions for conversion between BFloat16 (16-bit Brain Floating Point) and 32-bit Floating Point numbers and other. More...

Functions

SIMD_API void SimdFloat32ToBFloat16 (const float *src, size_t size, uint16_t *dst)
 Converts an array of 32-bit floats to 16-bit bfloat16 values. More...
 
SIMD_API void SimdBFloat16ToFloat32 (const uint16_t *src, size_t size, float *dst)
 Converts an array of 16-bit bfloat16 values to 32-bit floats. More...
 

Detailed Description

Functions for conversion between BFloat16 (16-bit Brain Floating Point) and 32-bit Floating Point numbers and other.

Function Documentation

◆ SimdFloat32ToBFloat16()

void SimdFloat32ToBFloat16 ( const float *  src,
size_t  size,
uint16_t *  dst 
)

Converts an array of 32-bit floats to 16-bit bfloat16 values.

For each element the function stores the bfloat16 representation of src[i] to dst[i]. The bfloat16 value contains the high 16 bits of IEEE 754 binary32 after rounding the discarded low 16 bits to nearest-even.

Parameters
[in]src- a pointer to the input array with 32-bit float point numbers.
[in]size- a number of elements in input and output arrays.
[out]dst- a pointer to the output array with 16-bit bfloat16 values.

◆ SimdBFloat16ToFloat32()

void SimdBFloat16ToFloat32 ( const uint16_t *  src,
size_t  size,
float *  dst 
)

Converts an array of 16-bit bfloat16 values to 32-bit floats.

For each element the function expands src[i] to IEEE 754 binary32 by placing the bfloat16 bits into the high 16 bits of the result and setting the low 16 bits to zero.

Parameters
[in]src- a pointer to the input array with 16-bit bfloat16 values.
[in]size- a number of elements in input and output arrays.
[out]dst- a pointer to the output array with 32-bit float point numbers.