Skip to content

[CI] FileSettingsRoleMappingUpgradeIT testRoleMappingsAppliedOnUpgrade {upgradedNodes=3} failing #118311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
elasticsearchmachine opened this issue Dec 10, 2024 · 5 comments
Assignees
Labels
needs:risk Requires assignment of a risk label (low, medium, blocker) :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Security Meta label for security team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Dec 10, 2024

Build Scans:

Reproduction Line:

./gradlew ":qa:rolling-upgrade:v8.5.3#bwcTest" -Dtests.class="org.elasticsearch.upgrades.FileSettingsRoleMappingUpgradeIT" -Dtests.method="testRoleMappingsAppliedOnUpgrade {upgradedNodes=3}" -Dtests.seed=839510D5116047CB -Dtests.bwc=true -Dtests.locale=pa-Arab-PK -Dtests.timezone=Africa/Asmera -Druntime.java=23

Applicable branches:
8.x

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.RuntimeException: An error occurred while checking cluster 'test-cluster' status.

Issue Reasons:

  • [8.x] 6 consecutive failures in step 8.5.3_bwc
  • [8.x] 6 failures in test testRoleMappingsAppliedOnUpgrade {upgradedNodes=3} (4.3% fail rate in 140 executions)
  • [8.x] 6 failures in step 8.5.3_bwc (100.0% fail rate in 6 executions)
  • [8.x] 4 failures in pipeline elasticsearch-periodic (40.0% fail rate in 10 executions)
  • [8.x] 2 failures in pipeline elasticsearch-pull-request (25.0% fail rate in 8 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) >test-failure Triaged test failures from CI Team:Security Meta label for security team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Dec 10, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-security (Team:Security)

@jfreden
Copy link
Contributor

jfreden commented Dec 11, 2024

I see this in the logs:

[2024-12-10T23:52:01,874][ERROR][o.e.r.s.FileSettingsService] [test-cluster-0] Error processing operator settings json file java.lang.IllegalStateException: Error processing state change request for file_settings, errors: Security index is not on the current version - the native realm will not be operational until the upgrade API is run on the security index
	at [email protected]/org.elasticsearch.reservedstate.service.ReservedClusterStateService.checkAndReportError(ReservedClusterStateService.java:248)
	at [email protected]/org.elasticsearch.reservedstate.service.ReservedClusterStateService$1.onFailure(ReservedClusterStateService.java:223)

This happens because of a race condition in older versions of operator defined role mappings, where the FileSettingsService tries to add role mappings to the security index before it has been initilized.

The test checks oldClusterHasFeature("gte_v8.7.0") so it shouldn't even run for 8.5.3. I wonder if it happens because the cluster is started even though the test shouldn't run (static class variable).

@jfreden
Copy link
Contributor

jfreden commented Dec 11, 2024

Same as: #110884

@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch 8.x

Mute Reasons:

  • [8.x] 6 consecutive failures in step 8.5.3_bwc
  • [8.x] 6 failures in test testRoleMappingsAppliedOnUpgrade {upgradedNodes=3} (4.3% fail rate in 140 executions)
  • [8.x] 6 failures in step 8.5.3_bwc (100.0% fail rate in 6 executions)
  • [8.x] 4 failures in pipeline elasticsearch-periodic (40.0% fail rate in 10 executions)
  • [8.x] 2 failures in pipeline elasticsearch-pull-request (25.0% fail rate in 8 executions)

Build Scans:

elasticsearchmachine added a commit that referenced this issue Dec 11, 2024
elasticsearchmachine pushed a commit that referenced this issue Dec 12, 2024
Fixes: #118311
#118310
#118309

Same issue that was fixed in:
#110963

`@BeforeClass` is executed after the test rules. This means it creates
the clusters for all the invalid versions, which sometimes doesnt work.

Change it to a rule which definitely evaluates before the clusters are
created. This will also speed up this test in CI.
@jfreden
Copy link
Contributor

jfreden commented Dec 12, 2024

Fixed in: 68d38a6

@jfreden jfreden closed this as completed Dec 12, 2024
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this issue Dec 16, 2024
…18455)

Fixes: elastic#118311
elastic#118310
elastic#118309

Same issue that was fixed in:
elastic#110963

`@BeforeClass` is executed after the test rules. This means it creates
the clusters for all the invalid versions, which sometimes doesnt work.

Change it to a rule which definitely evaluates before the clusters are
created. This will also speed up this test in CI.
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this issue Dec 16, 2024
…18455)

Fixes: elastic#118311
elastic#118310
elastic#118309

Same issue that was fixed in:
elastic#110963

`@BeforeClass` is executed after the test rules. This means it creates
the clusters for all the invalid versions, which sometimes doesnt work.

Change it to a rule which definitely evaluates before the clusters are
created. This will also speed up this test in CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:risk Requires assignment of a risk label (low, medium, blocker) :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Security Meta label for security team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

2 participants