Build executorch using android ndk with optimized kernels shows unsupported architecture 'armv8.2-a+bf16' and unknown type name 'bfloat16x8_t' #8508

elegracer · 2025-02-15T05:09:22Z

🐛 Describe the bug

For the core problem, please ignore the background and trials sections, and go to Main issue part directly.

A little background

I was originally intended to test inference performance with executorch on my android device. For my custom resnet based model, in torchscript c++ cpu, average inference time on-device is 10ms. I heard torchscript is not on active development and executorch is a replacement. So I built executorch with android ndk r21b and test my model.

First trial: build executorch and test inference time

I followed this page to successfully build and run executorch in python mode (build from source). https://pytorch.org/executorch/stable/getting-started-setup.html

Then I followed this page to convert my nn model to a .pte file. https://pytorch.org/executorch/stable/tutorials/export-to-executorch-tutorial.html

The export steps are as follows, only with model variable and example_args fit to my model.

import torch
from torch.export import export, export_for_inference, ExportedProgram
import exir
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner

class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.param = torch.nn.Parameter(torch.rand(3, 4))
        self.linear = torch.nn.Linear(4, 5)

    def forward(self, x):
        return self.linear(x + self.param).clamp(min=0.0, max=1.0)


example_args = (torch.randn(3, 4),)

pre_autograd_aten_dialect = export_for_inference(M(), example_args).module()

aten_dialect: ExportedProgram = export(pre_autograd_aten_dialect, example_args)

edge_program: exir.EdgeProgramManager = exir.to_edge(aten_dialect)

# Optionally do delegation:
# edge_delegated = edge_program.to_backend(XnnpackPartitioner())

executorch_program: exir.ExecutorchProgramManager = edge_program.to_executorch()

with open("model.pte", "wb") as file:
    file.write(executorch_program.buffer)

Then I followed this page to build examples/portable/executor_runner/executor_runner.cpp with android ndk r21b, with the following cmake commands. https://pytorch.org/executorch/stable/demo-apps-android.html

I modified executor_runner.cpp a bit to measure time used by the line Error status = method->execute();, simply with std::chrono utility.

cmake -S . -B build_android -DCMAKE_INSTALL_PREFIX=build_android -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake -DANDROID_ABI="arm64-v8a" -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON -DEXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL=ON -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON
cmake --build build_android -j`nproc` --target install

Then push the executable and the .pte model into my android device, then run it to test inference time, it shows average inference time 80ms.

Second trial: convert model with xnnpack and build executorch c++ executable with xnnpack backend

For model conversion part, I just uncomment the delegation line above.

For c++ executor part, I modified the configuration to the following. (i.e. added -DEXECUTORCH_BUILD_XNNPACK=ON). Then run the xnn_executor_runner on my android device.

cmake -S . -B build_android -DCMAKE_INSTALL_PREFIX=build_android -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake -DANDROID_ABI="arm64-v8a" -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON -DEXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL=ON -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON -DEXECUTORCH_BUILD_XNNPACK=ON

But inference time is still 80ms.

Main issue

So I tried building optimized kernels by adding -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON

cmake -S . -B build_android -DCMAKE_INSTALL_PREFIX=build_android -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake -DANDROID_ABI="arm64-v8a" -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON -DEXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL=ON -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON

By the way, building xnnpack itself is ok. So I didn't run into this issue. #6924
And adding -DXNNPACK_ENABLE_ARM_BF16=OFF doesn't change the result.

It's the optimized kernel part that gives the compiling error.

Cmake configuration summary (full txt file is uploaded below):

-- ******** Summary ********
--   CMAKE_BUILD_TYPE              : Release
--   CMAKE_CXX_STANDARD            : 17
--   CMAKE_CXX_COMPILER_ID         : Clang
--   CMAKE_TOOLCHAIN_FILE          : [ANDROID_NDK_ROOT]/android-ndk-r21b/build/cmake/android.toolchain.cmake
--   BUCK2                         : [HOME]/codes/executorch/buck2-bin/buck2-3bbde7daa94987db468d021ad625bc93dc62ba7fcb16945cb09b64aab077f284
--   PYTHON_EXECUTABLE             : python3
--   FLATC_EXECUTABLE              : flatc
--   EXECUTORCH_ENABLE_LOGGING              : OFF
--   EXECUTORCH_ENABLE_PROGRAM_VERIFICATION : OFF
--   EXECUTORCH_LOG_LEVEL                   : Info
--   EXECUTORCH_BUILD_ANDROID_JNI           : OFF
--   EXECUTORCH_BUILD_ARM_BAREMETAL         : OFF
--   EXECUTORCH_BUILD_COREML                : OFF
--   EXECUTORCH_BUILD_KERNELS_CUSTOM        : OFF
--   EXECUTORCH_BUILD_EXECUTOR_RUNNER       : ON
--   EXECUTORCH_BUILD_EXTENSION_DATA_LOADER : ON
--   EXECUTORCH_BUILD_EXTENSION_MODULE      : ON
--   EXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL : ON
--   EXECUTORCH_BUILD_EXTENSION_TENSOR      : ON
--   EXECUTORCH_BUILD_EXTENSION_TRAINING      : OFF
--   EXECUTORCH_BUILD_FLATC                 : ON
--   EXECUTORCH_BUILD_GFLAGS                : ON
--   EXECUTORCH_BUILD_GTESTS                :
--   EXECUTORCH_BUILD_HOST_TARGETS          : ON
--   EXECUTORCH_BUILD_MPS                   : OFF
--   EXECUTORCH_BUILD_PYBIND                : OFF
--   EXECUTORCH_BUILD_QNN                   : OFF
--   EXECUTORCH_BUILD_KERNELS_OPTIMIZED     : ON
--   EXECUTORCH_BUILD_KERNELS_QUANTIZED     : OFF
--   EXECUTORCH_BUILD_DEVTOOLS              : OFF
--   EXECUTORCH_BUILD_SIZE_TEST             : OFF
--   EXECUTORCH_BUILD_XNNPACK               : ON
--   EXECUTORCH_BUILD_VULKAN                : OFF
--   EXECUTORCH_BUILD_PTHREADPOOL           : ON
--   EXECUTORCH_BUILD_CPUINFO               : ON
-- Configuring done (7.0s)
-- Generating done (0.7s)
-- Build files have been written to: [HOME]/codes/executorch/build_android

Main compiling error (full building output is uploaded below):

[ 79%] Building CXX object kernels/optimized/CMakeFiles/cpublas.dir/blas/BlasKernel.cpp.o
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:80:29: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
f32_dot_bf16(float32x4_t a, bfloat16x8_t b, bfloat16x8_t c) {
                            ^~~~~~~~~~~~
                            float16x8_t
[ANDROID_NDK_ROOT]/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:65:56: note: 'float16x8_t' declared here
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
                                                       ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:80:45: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
f32_dot_bf16(float32x4_t a, bfloat16x8_t b, bfloat16x8_t c) {
                                            ^~~~~~~~~~~~
                                            float16x8_t
[ANDROID_NDK_ROOT]/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:65:56: note: 'float16x8_t' declared here
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
                                                       ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:79:1: warning: unsupported architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE static ET_INLINE float32x4_t
^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
  __attribute__((target("arch=armv8.2-a+bf16")))
                        ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:81:10: error: use of undeclared identifier 'vbfdotq_f32'
  return vbfdotq_f32(a, b, c);
         ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:84:1: warning: unsupported architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE static ET_INLINE void
^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
  __attribute__((target("arch=armv8.2-a+bf16")))
                        ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:90:9: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
  const bfloat16x8_t temp_vec1 = vld1q_bf16(reinterpret_cast<const __bf16*>(
        ^~~~~~~~~~~~
        float16x8_t
[ANDROID_NDK_ROOT]/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:65:56: note: 'float16x8_t' declared here
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
                                                       ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:90:68: error: unknown type name '__bf16'; did you mean '__be16'?
  const bfloat16x8_t temp_vec1 = vld1q_bf16(reinterpret_cast<const __bf16*>(
                                                                   ^~~~~~
                                                                   __be16
[ANDROID_NDK_ROOT]/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/linux/types.h:27:25: note: '__be16' declared here
typedef __u16 __bitwise __be16;
                        ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:90:34: error: use of undeclared identifier 'vld1q_bf16'
  const bfloat16x8_t temp_vec1 = vld1q_bf16(reinterpret_cast<const __bf16*>(
                                 ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:92:9: error: unknown type name 'bfloat16x8_t'; did you mean 'float16x8_t'?
  const bfloat16x8_t temp_vec2 = vld1q_bf16(reinterpret_cast<const __bf16*>(
        ^~~~~~~~~~~~
        float16x8_t
[ANDROID_NDK_ROOT]/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/9.0.8/include/arm_neon.h:65:56: note: 'float16x8_t' declared here
typedef __attribute__((neon_vector_type(8))) float16_t float16x8_t;
                                                       ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:92:68: error: unknown type name '__bf16'; did you mean '__be16'?
  const bfloat16x8_t temp_vec2 = vld1q_bf16(reinterpret_cast<const __bf16*>(
                                                                   ^~~~~~
                                                                   __be16
[ANDROID_NDK_ROOT]/android-ndk-r21b/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/linux/types.h:27:25: note: '__be16' declared here
typedef __u16 __bitwise __be16;
                        ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:92:34: error: use of undeclared identifier 'vld1q_bf16'
  const bfloat16x8_t temp_vec2 = vld1q_bf16(reinterpret_cast<const __bf16*>(
                                 ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:119:1: warning: unsupported architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE static ET_INLINE void
^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
  __attribute__((target("arch=armv8.2-a+bf16")))
                        ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:150:3: warning: unsupported architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
  ET_TARGET_ARM_BF16_ATTRIBUTE ET_INLINE void operator()(const Func& f) const {
  ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
  __attribute__((target("arch=armv8.2-a+bf16")))
                        ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:159:3: warning: unsupported architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
  ET_TARGET_ARM_BF16_ATTRIBUTE ET_INLINE void operator()(const Func& f) const {
  ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
  __attribute__((target("arch=armv8.2-a+bf16")))
                        ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:167:1: warning: unsupported architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
ET_TARGET_ARM_BF16_ATTRIBUTE float
^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
  __attribute__((target("arch=armv8.2-a+bf16")))
                        ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:176:33: warning: unsupported architecture 'armv8.2-a+bf16' in the 'target' attribute string; 'target' attribute ignored [-Wignored-attributes]
            ET_INLINE_ATTRIBUTE ET_TARGET_ARM_BF16_ATTRIBUTE {
                                ^
[HOME]/codes/executorch/kernels/optimized/blas/BlasKernel.cpp:78:25: note: expanded from macro 'ET_TARGET_ARM_BF16_ATTRIBUTE'
  __attribute__((target("arch=armv8.2-a+bf16")))
                        ^
7 warnings and 9 errors generated.
make[2]: *** [kernels/optimized/CMakeFiles/cpublas.dir/build.make:121: kernels/optimized/CMakeFiles/cpublas.dir/blas/BlasKernel.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....

PS

Thank you for developing executorch which I think is just brilliant !! I'm new to this, and tried many ways to fix this error. Now it's just nothing else I can do with this. Please help me T_T !

Versions

Collecting environment information...
PyTorch version: 2.6.0+cu126
Is debug build: False
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu120.04.2) 9.4.0
Clang version: 21.0.0 (++20250203053216+6dfe20dbbd65-1exp1~20250203173345.1303)
CMake version: version 3.31.4
Libc version: glibc-2.31

Python version: 3.10.0 (default, Feb 14 2025, 22:44:04) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 572.42
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 96
Model name: AMD Ryzen 5 PRO 4650G with Radeon Graphics
Stepping: 1
CPU MHz: 3692.984
BogoMIPS: 7385.96
Virtualization: AMD-V
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 192 KiB
L1i cache: 192 KiB
L2 cache: 3 MiB
L3 cache: 4 MiB
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip rdpid

Versions of relevant libraries:
[pip3] executorch==0.5.0a0+1bc0699
[pip3] numpy==2.0.0
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] torch==2.6.0+cu126
[pip3] torchao==0.8.0+gitebc43034
[pip3] torchaudio==2.6.0+cu126
[pip3] torchsr==1.0.4
[pip3] torchvision==0.21.0+cu126
[pip3] triton==3.2.0
[conda] Could not collect

cc @larryliu0820 @lucylq

The text was updated successfully, but these errors were encountered:

elegracer · 2025-02-15T05:09:58Z

cmake_configuration_output.txt

cmake_building_output.txt

kirklandsign · 2025-02-27T19:09:12Z

Hi @elegracer

So I built executorch with android ndk r21b and test my model.

You probably need the latest NDK version. Early version has early clang and it doesn't have bf16.

kirklandsign · 2025-02-27T19:10:12Z

Please feel free to re-open if you still have issue with NDK 27 or 28

spalatinate · 2025-02-28T15:32:10Z

@kirklandsign Actually I am running in a similar issue. I was working on executorch for a while. When I tried to reinstall executorch via install_requirements.sh --pybind xnnpack, I am running in the same error as above. This has not happened before. Thank you!

kirklandsign · 2025-02-28T18:11:08Z

Hi @spalatinate

Please

rm -rf pip-out cmake-out*
./install_requirements.sh
./install_executorch.sh --pybind xnnpack

Also please use a new Android NDK version.

Please re-open if it's still not working.

elegracer · 2025-03-01T09:00:07Z

With NDK 27c, the compilation succeeded!

spalatinate · 2025-03-03T15:16:07Z

@kirklandsign Sorry, I haven't specified my problem precisely... So I was trying to build executorch locally on my RPi5. It worked fine using the Clang compiler (version 14.0.6) and the release/0.4 branch. Now, with the release/0.5 and main branch, I am running into the error above. I guess it is related to the Clang compiler because when I switch to g++/gcc building executorch works just fine.

Just to be sure: I do not build anything on or for Android... :)

PS I do not have the permission to reopen this issue. Shall I open a new one? Could be relevant to others as well.

kirklandsign · 2025-03-03T19:56:04Z

@spalatinate Please feel free to open a new one. Thank you!

And include your environment. Honestly I haven't tried building it with RPi. For arm I only tried using macos.

jackzhxng assigned kirklandsign Feb 18, 2025

jackzhxng added module: build/install Issues related to the cmake and buck2 builds, and to installing ExecuTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Feb 18, 2025

kirklandsign closed this as completed Feb 27, 2025

spalatinate mentioned this issue Mar 4, 2025

Building ExecuTorch on RPi5 with Clang 14.0.6 fails due to bfloat incompatibility #8924

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build executorch using android ndk with optimized kernels shows unsupported architecture 'armv8.2-a+bf16' and unknown type name 'bfloat16x8_t' #8508

Build executorch using android ndk with optimized kernels shows unsupported architecture 'armv8.2-a+bf16' and unknown type name 'bfloat16x8_t' #8508

elegracer commented Feb 15, 2025 •

edited by pytorch-bot bot

Loading

elegracer commented Feb 15, 2025

kirklandsign commented Feb 27, 2025

kirklandsign commented Feb 27, 2025

spalatinate commented Feb 28, 2025 •

edited

Loading

kirklandsign commented Feb 28, 2025

elegracer commented Mar 1, 2025

spalatinate commented Mar 3, 2025

kirklandsign commented Mar 3, 2025

Build executorch using android ndk with optimized kernels shows unsupported architecture 'armv8.2-a+bf16' and unknown type name 'bfloat16x8_t' #8508

Build executorch using android ndk with optimized kernels shows unsupported architecture 'armv8.2-a+bf16' and unknown type name 'bfloat16x8_t' #8508

Comments

elegracer commented Feb 15, 2025 • edited by pytorch-bot bot Loading

🐛 Describe the bug

A little background

First trial: build executorch and test inference time

Second trial: convert model with xnnpack and build executorch c++ executable with xnnpack backend

Main issue

PS

Versions

elegracer commented Feb 15, 2025

kirklandsign commented Feb 27, 2025

kirklandsign commented Feb 27, 2025

spalatinate commented Feb 28, 2025 • edited Loading

kirklandsign commented Feb 28, 2025

elegracer commented Mar 1, 2025

spalatinate commented Mar 3, 2025

kirklandsign commented Mar 3, 2025

elegracer commented Feb 15, 2025 •

edited by pytorch-bot bot

Loading

spalatinate commented Feb 28, 2025 •

edited

Loading