Skip to content

[XNNPACK Backend] Modified MobileBERT model outputs original shape on Android despite successful desktop inference #8956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
qxuan512 opened this issue Mar 5, 2025 · 0 comments

Comments

@qxuan512
Copy link

qxuan512 commented Mar 5, 2025

Problem Description
After fine-tuning and modifying the structure of the MobileBERT model, I successfully exported it as a .pte file using the XNNPACK backend. When tested with xnn_executor_runner, the model produced the expected output shape [1, 12]. However, when invoked in the Android application, the model still outputs the original MobileBERT format (e.g., [1, 512]). There is an inconsistency between the desktop and mobile environments.
The Android application is modified based on ExecutorchDemo.


Steps to Reproduce

  1. Model Modification and Export

    • Modified the classification head of MobileBERT (e.g., adjusted the output dimension to 12 classes).
    • Used the aot_compiler toolchain to export the model as mobilebert_xnnpack_fp32.pte.
    • Verified the output format using the following command:
      ./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=mobilebert_xnnpack_fp32.pte
      # Output 0: tensor(sizes=[1, 12])
    Image
  2. Android Integration

    • Added the .pte file to the assets directory of the Android project.
    • Loaded the model using standard JNI calls:
      // Load model
      mModule = Module.load(assetFilePath(this, MODEL_FILE));
      outputTensor = mModule.forward(
                                      new EValue[]{
                                          EValue.from(inputs[0]),  // input_ids
                                          EValue.from(inputs[1]),  // attention_mask
                                      }
                                  )[1].toTensor();
    • During inference, the output shape remains the original [1, 512].

Expected Behavior
The Android application should produce an output shape of [1, 12], consistent with the desktop xnn_executor_runner results.

Actual Behavior
The Android output retains the original model shape [1, 512], indicating that the modification has not taken effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant