Skip to content

[clang][X86] Wrong result for __builtin_elementwise_fma on _Float16 #128450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SEt-t opened this issue Feb 24, 2025 · 10 comments
Open

[clang][X86] Wrong result for __builtin_elementwise_fma on _Float16 #128450

SEt-t opened this issue Feb 24, 2025 · 10 comments
Assignees
Labels
llvm:SelectionDAG SelectionDAGISel as well miscompilation

Comments

@SEt-t
Copy link

SEt-t commented Feb 24, 2025

Godbolt: https://godbolt.org/z/Ydj17K17b

Clang uses single-precision FMA to emulate half-precision FMA, what is wrong as it doesn't have enough precision.

Example, round to even: 0x1.400p+8 * 0x1.008p+7 + 0x1.000p-24
Precise result: 0x1.40a0000002p+15
Half-precision FMA: 0x1.40cp+15
Single-precision FMA: 0x1.40a000p+15
(clang) Single-precision FMA -> half-precision: 0x1.408p+15

Another example: 0x1.eb8p-12 * 0x1.9p-11 - 0x1p-11

To produce correct result single-precision multiplication, then double-precision addition seems to be enough.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Feb 24, 2025
@frederick-vs-ja frederick-vs-ja added clang:codegen IR generation bugs: mangling, exceptions, etc. miscompilation and removed clang Clang issues not falling into any other category labels Feb 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 24, 2025

@llvm/issue-subscribers-clang-codegen

Author: SEt (SEt-t)

Godbolt: https://godbolt.org/z/Ydj17K17b

Clang uses single-precision FMA to emulate half-precision FMA, what is wrong as it doesn't have enough precision.

Example, round to even: 0x1.400p+8 * 0x1.008p+7 + 0x1.000p-24
Precise result: 0x1.40a0000002p+15
Half-precision FMA: 0x1.40cp+15
Single-precision FMA: 0x1.40a000p+15
(clang) Single-precision FMA -> half-precision: 0x1.408p+15

Another example: 0x1.eb8p-12 * 0x1.9p-11 - 0x1p-11

To produce correct result single-precision multiplication, then double-precision addition seems to be enough.

@RKSimon RKSimon added llvm:codegen and removed clang:codegen IR generation bugs: mangling, exceptions, etc. labels Feb 24, 2025
@akshaykumars614
Copy link
Contributor

I want to work on this issue. Any input or suggestions on where to start would be really helpful!

@beetrees
Copy link
Contributor

This bug is a duplicate of #98389.

@akshaykumars614 AFAICT this is where the half FMA operation is lowered incorrectly:

SDValue DAGTypeLegalizer::SoftPromoteHalfRes_FMAD(SDNode *N) {
EVT OVT = N->getValueType(0);
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), OVT);
SDValue Op0 = GetSoftPromotedHalf(N->getOperand(0));
SDValue Op1 = GetSoftPromotedHalf(N->getOperand(1));
SDValue Op2 = GetSoftPromotedHalf(N->getOperand(2));
SDLoc dl(N);
// Promote to the larger FP type.
auto PromotionOpcode = GetPromotionOpcode(OVT, NVT);
Op0 = DAG.getNode(PromotionOpcode, dl, NVT, Op0);
Op1 = DAG.getNode(PromotionOpcode, dl, NVT, Op1);
Op2 = DAG.getNode(PromotionOpcode, dl, NVT, Op2);
SDValue Res = DAG.getNode(N->getOpcode(), dl, NVT, Op0, Op1, Op2);
// Convert back to FP16 as an integer.
return DAG.getNode(GetPromotionOpcode(NVT, OVT), dl, MVT::i16, Res);
}

@EugeneZelenko EugeneZelenko added llvm:SelectionDAG SelectionDAGISel as well and removed llvm:codegen labels Mar 15, 2025
@RalfJung
Copy link
Contributor

To produce correct result single-precision multiplication, then double-precision addition seems to be enough.

Are you sure this will always be enough for all inputs?

@beetrees
Copy link
Contributor

beetrees commented Mar 16, 2025

Yes. Building off this paper, 16-bit floats have an 11-bit significand (including the implicit bit), meaning that the result of the multiplication requires 22 bits to store losslessly (0b111_1111_1111 * 0b111_1111_1111 = 0b11_1111_1111_0000_0000_0001). (EDIT: The following sentence if only true when the multiplication result is less than the number being added: see #128450 (comment) for details) The addition therefore requires 22 + 11 + 1 = 34 bits of precision in order to prevent double rounding problems. 64-bit floats have 53-bit significands, whereas 32-bit floats only have 24-bit significands.

@RalfJung
Copy link
Contributor

Okay so single precision is enough for the multiplication (24 >= 22) and then double-precision is enough for the total operation (53 >= 34)?

@beetrees
Copy link
Contributor

beetrees commented Mar 16, 2025

Yes EDIT: see #128450 (comment)

@SEt-t
Copy link
Author

SEt-t commented Mar 16, 2025

The addition therefore requires 22 + 11 + 1 = 34 bits of precision in order to prevent double rounding problems.

You are wrong about addition bits (see my example of 40 bits), but conclusion that double is enough for it is correct.

@beetrees
Copy link
Contributor

Ah yes, I see where I messed up.

@beetrees
Copy link
Contributor

Corrected explanation for fmaf16(a, b, c) == fma(a, b, c):

As 16-bit floats only have 11-bit significands (64-bit floats have 53-bit significands), the result of a * b (henceforth referred to as x) will always be exactly representable in 11 + 11 = 22 bits. In most cases (since 16-bit floats only have small exponent range), the result of the addition/subtraction of x and c will also be exactly representable; there are two scenarios where this is not the case:

c > x, such that their exponents differ by more than 53 - 22 = 32

As c is only 11-bits in size, my previous comment is still correct for this case as the digits required to separate the two numbers will be in the correct place to prevent double rounding from being an issue when the final result is rounded to a 64-bit float and then to a 16-bit float.

x > c, such that their exponents differ by more than 53 - 11 = 42

As c is a regular 16-bit float it has a minimum exponent of -14, meaning that x must have an exponent of at least -14 + 42 + 1 = 29. The result's exponent will be larger than the maximum 16-bit float exponent of 15, meaning that the result will always be rounded to infinity when converting back to a 16-bit float (for reference, this means that the intermediate float type needs at least a 11 + 14 + 15 = 40 bit significand; this means that results where x > c will either be exactly representable or round to infinity when converted to a 16-bit float).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well miscompilation
Projects
None yet
Development

No branches or pull requests

8 participants