-
Notifications
You must be signed in to change notification settings - Fork 527
Build executorch using android ndk with optimized kernels shows unsupported architecture 'armv8.2-a+bf16' and unknown type name 'bfloat16x8_t' #8508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @elegracer
You probably need the latest NDK version. Early version has early clang and it doesn't have bf16. |
Please feel free to re-open if you still have issue with NDK 27 or 28 |
@kirklandsign Actually I am running in a similar issue. I was working on executorch for a while. When I tried to reinstall executorch via |
Hi @spalatinate Please
Also please use a new Android NDK version. Please re-open if it's still not working. |
With NDK 27c, the compilation succeeded! |
@kirklandsign Sorry, I haven't specified my problem precisely... So I was trying to build executorch locally on my RPi5. It worked fine using the Clang compiler (version 14.0.6) and the release/0.4 branch. Now, with the release/0.5 and main branch, I am running into the error above. I guess it is related to the Clang compiler because when I switch to g++/gcc building executorch works just fine. Just to be sure: I do not build anything on or for Android... :) PS I do not have the permission to reopen this issue. Shall I open a new one? Could be relevant to others as well. |
@spalatinate Please feel free to open a new one. Thank you! And include your environment. Honestly I haven't tried building it with RPi. For arm I only tried using macos. |
🐛 Describe the bug
For the core problem, please ignore the background and trials sections, and go to Main issue part directly.
A little background
I was originally intended to test inference performance with executorch on my android device. For my custom resnet based model, in torchscript c++ cpu, average inference time on-device is 10ms. I heard torchscript is not on active development and executorch is a replacement. So I built executorch with android ndk r21b and test my model.
First trial: build executorch and test inference time
I followed this page to successfully build and run executorch in python mode (build from source). https://pytorch.org/executorch/stable/getting-started-setup.html
Then I followed this page to convert my nn model to a .pte file. https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html
The export steps are as follows, only with model variable and example_args fit to my model.
Then I followed this page to build
examples/portable/executor_runner/executor_runner.cpp
with android ndk r21b, with the following cmake commands. https://pytorch.org/executorch/stable/demo-apps-android.htmlI modified
executor_runner.cpp
a bit to measure time used by the lineError status = method->execute();
, simply withstd::chrono
utility.Then push the executable and the .pte model into my android device, then run it to test inference time, it shows average inference time 80ms.
Second trial: convert model with xnnpack and build executorch c++ executable with xnnpack backend
For model conversion part, I just uncomment the delegation line above.
For c++ executor part, I modified the configuration to the following. (i.e. added
-DEXECUTORCH_BUILD_XNNPACK=ON
). Then run thexnn_executor_runner
on my android device.But inference time is still 80ms.
Main issue
So I tried building optimized kernels by adding
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON
By the way, building xnnpack itself is ok. So I didn't run into this issue. #6924
And adding
-DXNNPACK_ENABLE_ARM_BF16=OFF
doesn't change the result.It's the optimized kernel part that gives the compiling error.
Cmake configuration summary (full txt file is uploaded below):
Main compiling error (full building output is uploaded below):
PS
Thank you for developing executorch which I think is just brilliant !! I'm new to this, and tried many ways to fix this error. Now it's just nothing else I can do with this. Please help me T_T !
Versions
Collecting environment information...
PyTorch version: 2.6.0+cu126
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1
20.04.2) 9.4.0exp1~20250203173345.1303)Clang version: 21.0.0 (++20250203053216+6dfe20dbbd65-1
CMake version: version 3.31.4
Libc version: glibc-2.31
Python version: 3.10.0 (default, Feb 14 2025, 22:44:04) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 572.42
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 96
Model name: AMD Ryzen 5 PRO 4650G with Radeon Graphics
Stepping: 1
CPU MHz: 3692.984
BogoMIPS: 7385.96
Virtualization: AMD-V
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 192 KiB
L1i cache: 192 KiB
L2 cache: 3 MiB
L3 cache: 4 MiB
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip rdpid
Versions of relevant libraries:
[pip3] executorch==0.5.0a0+1bc0699
[pip3] numpy==2.0.0
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] torch==2.6.0+cu126
[pip3] torchao==0.8.0+gitebc43034
[pip3] torchaudio==2.6.0+cu126
[pip3] torchsr==1.0.4
[pip3] torchvision==0.21.0+cu126
[pip3] triton==3.2.0
[conda] Could not collect
cc @larryliu0820 @lucylq
The text was updated successfully, but these errors were encountered: