Skip to content

Fix flaky test: wait until indexes are updated #6552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dsabsay
Copy link
Contributor

@dsabsay dsabsay commented Jan 25, 2025

What this PR does:
Fixes flaky index loader test. Indexes are updates after the loadAttempts counter is incremented. The test only waited until loadAttempts changed. This means the test fails if the test goroutine continues before the index update method completes. Fixed by checking the indexes before continuing.

Reproducer: go test -count=100 -run TestLoader_ShouldCacheIndexNotFoundOnBackgroundUpdates ./pkg/storage/tsdb/bucketindex

I validated locally by running the reproducer 4 times (all failed) then this fix 4 times (all passed).

Which issue(s) this PR fixes:
Fixes #6551

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Indexes are updates after the loadAttempts counter is incremented. This
means the test can fail if the test goroutine continues before the index
update method completes.

Signed-off-by: Daniel Sabsay <[email protected]>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@CharlieTLe
Copy link
Member

Hmm, shouldn't the cortex_bucket_index_loaded metric still be 1 in this case since we're trying to test that the ErrIndexNotFound result is cached?

The PR that introduced this test is here: #3717

Keep cached the "bucket index not found" error during background checks

I think the problem is that countLoadedIndexesMetric() is not counting indexes that are not found.

if idx.index != nil {

Perhaps we should check that index is not nil or that the error is ErrIndexNotFound when counting how many indexes are loaded in the in-memory cache.

if idx.index != nil || errors.Is(idx.err, ErrIndexNotFound) {
    count++
}

I think the reason why it's flaky is because the UpdateOnErrorInterval is only 1 second in this test case, so there was a chance it would delete the ErrIndexNotFound index result on the next check interval which would set the cortex_bucket_index_loaded metric to 0.

@dsabsay
Copy link
Contributor Author

dsabsay commented Feb 13, 2025

Counting "not found" indexes as "not loaded" seems to be an intentional choice when this was implemented:

// We expect the bucket index is not considered loaded because of the error.

My main goal with this change is to eliminate the flaky test (to reduce burden on all of us). I'd like to treat that as the priority here. Perhaps we can create an issue to discuss changing the index cache in a separate thread. WDYT? (I also acknowledge I am working on code I have limited knowledge of so want to hear your thoughts)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: TestLoader_ShouldCacheIndexNotFoundOnBackgroundUpdates
3 participants