Building ExecuTorch on RPi5 with Clang 14.0.6 fails due to bfloat incompatibility #8924

spalatinate · 2025-03-04T13:56:10Z

🐛 Describe the bug

As discussed with @kirklandsign in Issue #8508, I am opening a separate one here.

I was trying to build executorch locally on my RPi5. It worked fine using the Clang compiler (version 14.0.6) and the release/0.4 branch. Now, with the release/0.5 and main branch, I am running into the error below. I guess it is related to the Clang compiler because when I switch to g++/gcc building executorch works just fine.


[ 56%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-prod.dir/src/qd8-f16-qc8w-igemm/gen/qd8-f16-qc8w-igemm-1x8c2s4-minmax-neonfp16arith-mlal.c.o
  [ 56%] Building CXX object kernels/portable/CMakeFiles/portable_kernels.dir/cpu/op_addmm.cpp.o
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:80:29: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
  f32_dot_bf16(float32x4_t a, bfloat16x8_t b, bfloat16x8_t c) {
                              ^~~~~~~~~~~~
                              float16x8_t
  /usr/lib/llvm-14/lib/clang/14.0.6/include/arm_neon.h:75:56: note: 'float16x8_t' declared here
  typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
                                                         ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:80:45: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
  f32_dot_bf16(float32x4_t a, bfloat16x8_t b, bfloat16x8_t c) {
                                              ^~~~~~~~~~~~
                                              float16x8_t
  /usr/lib/llvm-14/lib/clang/14.0.6/include/arm_neon.h:75:56: note: 'float16x8_t' declared here
  typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
                                                         ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:79:1: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
  ET_TARGET_ARM_BF16_ATTRIBUTE static ET_INLINE float32x4_t
  ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
    __attribute__((target("arch=armv8.2-a+bf16")))
                          ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:81:10: error: use of undeclared identifier 'vbfdotq_f32'
    return vbfdotq_f32(a, b, c);
           ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:84:1: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
  ET_TARGET_ARM_BF16_ATTRIBUTE static ET_INLINE void
  ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
    __attribute__((target("arch=armv8.2-a+bf16")))
                          ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:90:9: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
    const bfloat16x8_t temp_vec1 = vld1q_bf16(reinterpret_cast<const __bf16*>(
          ^~~~~~~~~~~~
          float16x8_t
  /usr/lib/llvm-14/lib/clang/14.0.6/include/arm_neon.h:75:56: note: 'float16x8_t' declared here
  typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
                                                         ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:90:68: error: __bf16 is not supported on this target
    const bfloat16x8_t temp_vec1 = vld1q_bf16(reinterpret_cast<const __bf16*>(
                                                                     ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:90:34: error: use of undeclared identifier 'vld1q_bf16'
    const bfloat16x8_t temp_vec1 = vld1q_bf16(reinterpret_cast<const __bf16*>(
                                   ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:92:9: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
    const bfloat16x8_t temp_vec2 = vld1q_bf16(reinterpret_cast<const __bf16*>(
          ^~~~~~~~~~~~
          float16x8_t
  /usr/lib/llvm-14/lib/clang/14.0.6/include/arm_neon.h:75:56: note: 'float16x8_t' declared here
  typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
                                                         ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:92:68: error: __bf16 is not supported on this target
    const bfloat16x8_t temp_vec2 = vld1q_bf16(reinterpret_cast<const __bf16*>(
                                                                     ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:92:34: error: use of undeclared identifier 'vld1q_bf16'
    const bfloat16x8_t temp_vec2 = vld1q_bf16(reinterpret_cast<const __bf16*>(
                                   ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:119:1: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
  ET_TARGET_ARM_BF16_ATTRIBUTE static ET_INLINE void
  ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
    __attribute__((target("arch=armv8.2-a+bf16")))
                          ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:150:3: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
    ET_TARGET_ARM_BF16_ATTRIBUTE ET_INLINE void operator()(const Func& f) const {
    ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
    __attribute__((target("arch=armv8.2-a+bf16")))
                          ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:159:3: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
    ET_TARGET_ARM_BF16_ATTRIBUTE ET_INLINE void operator()(const Func& f) const {
    ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
    __attribute__((target("arch=armv8.2-a+bf16")))
                          ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:167:1: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
  ET_TARGET_ARM_BF16_ATTRIBUTE float
  ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
    __attribute__((target("arch=armv8.2-a+bf16")))
                          ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:176:33: warning: unknown architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
              ET_INLINE_ATTRIBUTE ET_TARGET_ARM_BF16_ATTRIBUTE {
                                  ^
  /home/executorch_05/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
    __attribute__((target("arch=armv8.2-a+bf16")))
                          ^
  7 warnings and 9 errors generated.
  gmake[3]: *** [kernels/optimized/CMakeFiles/cpublas.dir/build.make:121: kernels/optimized/CMakeFiles/cpublas.dir/blas/BlasKernel.cpp.o] Fehler 1
  gmake[2]: *** [CMakeFiles/Makefile2:1238: kernels/optimized/CMakeFiles/cpublas.dir/all] Fehler 2
  gmake[2]: *** Es wird auf noch nicht beendete Prozesse gewartet....
  [ 56%] Building CXX object kernels/portable/CMakeFiles/portable_kernels.dir/cpu/op_alias_copy.cpp.

Versions

Collecting environment information...
PyTorch version: 2.7.0.dev20250131+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 12 (bookworm) (aarch64)
GCC version: (Debian 12.2.0-14) 12.2.0
Clang version: 14.0.6
CMake version: version 3.31.6
Libc version: glibc-2.36

Python version: 3.10.0 (default, Mar 3 2022, 09:51:40) [GCC 10.2.0] (64-bit runtime)
Python platform: Linux-6.6.74+rpt-rpi-v8-aarch64-with-glibc2.36
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Cortex-A76
Model: 1
Thread(s) per core: 1
Core(s) per cluster: 4
Socket(s): -
Cluster(s): 1
Stepping: r4p1
CPU(s) scaling MHz: 100%
CPU max MHz: 2400,0000
CPU min MHz: 1500,0000
BogoMIPS: 108,00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
L1d cache: 256 KiB (4 instances)
L1i cache: 256 KiB (4 instances)
L2 cache: 2 MiB (4 instances)
L3 cache: 2 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Mitigation; CSV2, BHB
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] executorch==0.6.0a0+542480c
[pip3] numpy==2.2.3
[pip3] torch==2.7.0.dev20250131+cpu
[pip3] torchao==0.10.0+git7d879462
[pip3] torchaudio==2.6.0.dev20250131
[pip3] torchgen==0.0.1
[pip3] torchsr==1.0.4
[pip3] torchvision==0.22.0.dev20250131
[conda] executorch 0.6.0a0+542480c pypi_0 pypi
[conda] numpy 2.2.3 pypi_0 pypi
[conda] torch 2.7.0.dev20250131+cpu pypi_0 pypi
[conda] torchao 0.10.0+git7d879462 pypi_0 pypi
[conda] torchaudio 2.6.0.dev20250131 pypi_0 pypi
[conda] torchgen 0.0.1 pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchvision 0.22.0.dev20250131 pypi_0 pypi

cc @larryliu0820 @lucylq

The text was updated successfully, but these errors were encountered:

mergennachin · 2025-03-04T14:36:08Z

cc @swolchok @digantdesai - do you know?

swolchok · 2025-03-04T16:50:03Z

there's conditional compilation for this in the PyTorch version of this file. they need to be put back in sync or ideally, refactored and shared since we now have support for sharing code with PyTorch core. (and per @malfet we should be attempting to detect whether the compiler will actually support this stuff at CMake time rather than hardcoding compiler versions.)

swolchok · 2025-03-04T16:50:25Z

I am busy with other things right now, but I am very likely the person to fix this.

spalatinate · 2025-03-12T10:41:21Z

@swolchok I assume this issue might be related to Build executorch using android ndk with optimized kernels shows unsupported architecture 'armv8.2-a+bf16' and unknown type name 'bfloat16x8_t' #8508: "For older Arm GCC (<13.1 IIRC) we need to use __fp16 and include <arm_fp16.h>, but for newer Arm GCC, _Float16 is available. "

spalatinate changed the title ~~Building ExecuTorch on RPi5 with Clang 14.0.6 fails due to bfloat incompatbility~~ Building ExecuTorch on RPi5 with Clang 14.0.6 fails due to bfloat incompatibility Mar 4, 2025

swolchok self-assigned this Mar 4, 2025

iseeyuan added module: build/install Issues related to the cmake and buck2 builds, and to installing ExecuTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 4, 2025

github-actions bot mentioned this issue Mar 10, 2025

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#15

Open

This was referenced Mar 17, 2025

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#17

Open

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#19

Open

github-actions bot mentioned this issue Mar 31, 2025

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#21

Open

github-actions bot mentioned this issue Apr 7, 2025

Weekly issue metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building ExecuTorch on RPi5 with Clang 14.0.6 fails due to bfloat incompatibility #8924

Building ExecuTorch on RPi5 with Clang 14.0.6 fails due to bfloat incompatibility #8924

spalatinate commented Mar 4, 2025 •

edited by pytorch-bot bot

Loading

mergennachin commented Mar 4, 2025

swolchok commented Mar 4, 2025

swolchok commented Mar 4, 2025

spalatinate commented Mar 12, 2025

Building ExecuTorch on RPi5 with Clang 14.0.6 fails due to bfloat incompatibility #8924

Building ExecuTorch on RPi5 with Clang 14.0.6 fails due to bfloat incompatibility #8924

Comments

spalatinate commented Mar 4, 2025 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

mergennachin commented Mar 4, 2025

swolchok commented Mar 4, 2025

swolchok commented Mar 4, 2025

spalatinate commented Mar 12, 2025

spalatinate commented Mar 4, 2025 •

edited by pytorch-bot bot

Loading