Functions for conversion between BFloat16 (16-bit Brain Floating Point) and 32-bit Floating Point numbers and other. More...
Functions | |
| SIMD_API void | SimdFloat32ToBFloat16 (const float *src, size_t size, uint16_t *dst) |
| Converts an array of 32-bit floats to 16-bit bfloat16 values. More... | |
| SIMD_API void | SimdBFloat16ToFloat32 (const uint16_t *src, size_t size, float *dst) |
| Converts an array of 16-bit bfloat16 values to 32-bit floats. More... | |
Detailed Description
Functions for conversion between BFloat16 (16-bit Brain Floating Point) and 32-bit Floating Point numbers and other.
Function Documentation
◆ SimdFloat32ToBFloat16()
| void SimdFloat32ToBFloat16 | ( | const float * | src, |
| size_t | size, | ||
| uint16_t * | dst | ||
| ) |
Converts an array of 32-bit floats to 16-bit bfloat16 values.
For each element the function stores the bfloat16 representation of src[i] to dst[i]. The bfloat16 value contains the high 16 bits of IEEE 754 binary32 after rounding the discarded low 16 bits to nearest-even.
- Parameters
-
[in] src - a pointer to the input array with 32-bit float point numbers. [in] size - a number of elements in input and output arrays. [out] dst - a pointer to the output array with 16-bit bfloat16 values.
◆ SimdBFloat16ToFloat32()
| void SimdBFloat16ToFloat32 | ( | const uint16_t * | src, |
| size_t | size, | ||
| float * | dst | ||
| ) |
Converts an array of 16-bit bfloat16 values to 32-bit floats.
For each element the function expands src[i] to IEEE 754 binary32 by placing the bfloat16 bits into the high 16 bits of the result and setting the low 16 bits to zero.
- Parameters
-
[in] src - a pointer to the input array with 16-bit bfloat16 values. [in] size - a number of elements in input and output arrays. [out] dst - a pointer to the output array with 32-bit float point numbers.