[clang][X86] Wrong result for __builtin_elementwise_fma on _Float16 #128450

SEt-t · 2025-02-24T01:38:30Z

Godbolt: https://godbolt.org/z/Ydj17K17b

Clang uses single-precision FMA to emulate half-precision FMA, what is wrong as it doesn't have enough precision.

Example, round to even: 0x1.400p+8 * 0x1.008p+7 + 0x1.000p-24
Precise result: 0x1.40a0000002p+15
Half-precision FMA: 0x1.40cp+15
Single-precision FMA: 0x1.40a000p+15
(clang) Single-precision FMA -> half-precision: 0x1.408p+15

Another example: 0x1.eb8p-12 * 0x1.9p-11 - 0x1p-11

To produce correct result single-precision multiplication, then double-precision addition seems to be enough.

llvmbot · 2025-02-24T08:16:08Z

@llvm/issue-subscribers-clang-codegen

Author: SEt (SEt-t)

Godbolt: https://godbolt.org/z/Ydj17K17b

Clang uses single-precision FMA to emulate half-precision FMA, what is wrong as it doesn't have enough precision.

Example, round to even: 0x1.400p+8 * 0x1.008p+7 + 0x1.000p-24
Precise result: 0x1.40a0000002p+15
Half-precision FMA: 0x1.40cp+15
Single-precision FMA: 0x1.40a000p+15
(clang) Single-precision FMA -> half-precision: 0x1.408p+15

Another example: 0x1.eb8p-12 * 0x1.9p-11 - 0x1p-11

To produce correct result single-precision multiplication, then double-precision addition seems to be enough.

akshaykumars614 · 2025-03-14T01:59:18Z

I want to work on this issue. Any input or suggestions on where to start would be really helpful!

beetrees · 2025-03-15T14:06:52Z

This bug is a duplicate of #98389.

@akshaykumars614 AFAICT this is where the half FMA operation is lowered incorrectly:

llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

Lines 3396 to 3414 in 5cc2ae0

    
           SDValue DAGTypeLegalizer::SoftPromoteHalfRes_FMAD(SDNode *N) { 
        
             EVT OVT = N->getValueType(0); 
        
             EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), OVT); 
        
             SDValue Op0 = GetSoftPromotedHalf(N->getOperand(0)); 
        
             SDValue Op1 = GetSoftPromotedHalf(N->getOperand(1)); 
        
             SDValue Op2 = GetSoftPromotedHalf(N->getOperand(2)); 
        
             SDLoc dl(N); 
        
             // Promote to the larger FP type. 
        
             auto PromotionOpcode = GetPromotionOpcode(OVT, NVT); 
        
             Op0 = DAG.getNode(PromotionOpcode, dl, NVT, Op0); 
        
             Op1 = DAG.getNode(PromotionOpcode, dl, NVT, Op1); 
        
             Op2 = DAG.getNode(PromotionOpcode, dl, NVT, Op2); 
        
             SDValue Res = DAG.getNode(N->getOpcode(), dl, NVT, Op0, Op1, Op2); 
        
             // Convert back to FP16 as an integer. 
        
             return DAG.getNode(GetPromotionOpcode(NVT, OVT), dl, MVT::i16, Res); 
        
           }

RalfJung · 2025-03-16T11:15:32Z

To produce correct result single-precision multiplication, then double-precision addition seems to be enough.

Are you sure this will always be enough for all inputs?

beetrees · 2025-03-16T11:50:07Z

Yes. Building off this paper, 16-bit floats have an 11-bit significand (including the implicit bit), meaning that the result of the multiplication requires 22 bits to store losslessly (0b111_1111_1111 * 0b111_1111_1111 = 0b11_1111_1111_0000_0000_0001). (EDIT: The following sentence if only true when the multiplication result is less than the number being added: see #128450 (comment) for details) The addition therefore requires 22 + 11 + 1 = 34 bits of precision in order to prevent double rounding problems. 64-bit floats have 53-bit significands, whereas 32-bit floats only have 24-bit significands.

RalfJung · 2025-03-16T12:12:22Z

Okay so single precision is enough for the multiplication (24 >= 22) and then double-precision is enough for the total operation (53 >= 34)?

beetrees · 2025-03-16T12:38:14Z

Yes EDIT: see #128450 (comment)

SEt-t · 2025-03-16T13:49:34Z

The addition therefore requires 22 + 11 + 1 = 34 bits of precision in order to prevent double rounding problems.

You are wrong about addition bits (see my example of 40 bits), but conclusion that double is enough for it is correct.

beetrees · 2025-03-16T14:36:58Z

Ah yes, I see where I messed up.

beetrees · 2025-03-16T16:53:08Z

Corrected explanation for `fmaf16(a, b, c) == fma(a, b, c)`:

As 16-bit floats only have 11-bit significands (64-bit floats have 53-bit significands), the result of a * b (henceforth referred to as x) will always be exactly representable in 11 + 11 = 22 bits. In most cases (since 16-bit floats only have small exponent range), the result of the addition/subtraction of x and c will also be exactly representable; there are two scenarios where this is not the case:

`c > x`, such that their exponents differ by more than 53 - 22 = 32

As c is only 11-bits in size, my previous comment is still correct for this case as the digits required to separate the two numbers will be in the correct place to prevent double rounding from being an issue when the final result is rounded to a 64-bit float and then to a 16-bit float.

`x > c`, such that their exponents differ by more than 53 - 11 = 42

As c is a regular 16-bit float it has a minimum exponent of -14, meaning that x must have an exponent of at least -14 + 42 + 1 = 29. The result's exponent will be larger than the maximum 16-bit float exponent of 15, meaning that the result will always be rounded to infinity when converting back to a 16-bit float (for reference, this means that the intermediate float type needs at least a 11 + 14 + 15 = 40 bit significand; this means that results where x > c will either be exactly representable or round to infinity when converted to a 16-bit float).

llvmbot added the clang Clang issues not falling into any other category label Feb 24, 2025

frederick-vs-ja added clang:codegen IR generation bugs: mangling, exceptions, etc. miscompilation and removed clang Clang issues not falling into any other category labels Feb 24, 2025

RKSimon added llvm:codegen and removed clang:codegen IR generation bugs: mangling, exceptions, etc. labels Feb 24, 2025

akshaykumars614 self-assigned this Mar 14, 2025

RKSimon mentioned this issue Mar 15, 2025

LLVM miscompiles consecutive half operations by using too much precision on several backends #97975

Open

11 tasks

giordano mentioned this issue Mar 15, 2025

Slightly inaccurate emulated fma on Float16 JuliaLang/julia#57784

Open

EugeneZelenko added llvm:SelectionDAG SelectionDAGISel as well and removed llvm:codegen labels Mar 15, 2025

beetrees mentioned this issue Mar 16, 2025

llvm.fma.bf16 intrinsic is expanded incorrectly #131531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clang][X86] Wrong result for __builtin_elementwise_fma on _Float16 #128450

[clang][X86] Wrong result for __builtin_elementwise_fma on _Float16 #128450

SEt-t commented Feb 24, 2025

llvmbot commented Feb 24, 2025

akshaykumars614 commented Mar 14, 2025

beetrees commented Mar 15, 2025

RalfJung commented Mar 16, 2025

beetrees commented Mar 16, 2025 •

edited

Loading

RalfJung commented Mar 16, 2025

beetrees commented Mar 16, 2025 •

edited

Loading

SEt-t commented Mar 16, 2025

beetrees commented Mar 16, 2025

beetrees commented Mar 16, 2025

[clang][X86] Wrong result for __builtin_elementwise_fma on _Float16 #128450

[clang][X86] Wrong result for __builtin_elementwise_fma on _Float16 #128450

Comments

SEt-t commented Feb 24, 2025

llvmbot commented Feb 24, 2025

akshaykumars614 commented Mar 14, 2025

beetrees commented Mar 15, 2025

RalfJung commented Mar 16, 2025

beetrees commented Mar 16, 2025 • edited Loading

RalfJung commented Mar 16, 2025

beetrees commented Mar 16, 2025 • edited Loading

SEt-t commented Mar 16, 2025

beetrees commented Mar 16, 2025

beetrees commented Mar 16, 2025

Corrected explanation for fmaf16(a, b, c) == fma(a, b, c):

c > x, such that their exponents differ by more than 53 - 22 = 32

x > c, such that their exponents differ by more than 53 - 11 = 42

beetrees commented Mar 16, 2025 •

edited

Loading

beetrees commented Mar 16, 2025 •

edited

Loading

Corrected explanation for `fmaf16(a, b, c) == fma(a, b, c)`:

`c > x`, such that their exponents differ by more than 53 - 22 = 32

`x > c`, such that their exponents differ by more than 53 - 11 = 42