Skip to content

[Query-Frontend] Add dynamic query vertical sharding #6678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

afhassan
Copy link
Contributor

@afhassan afhassan commented Mar 28, 2025

What this PR does:

  • Refactor dynamic query splitting to reduce complexity
    • The main source of complexity in the previous implementation was the variability of the first split duration compared to other subsequent splits when calculating the total duration of data that will be fetched from storage by the query.
    • The solution is to ignore the first interval when calculating the duration fetched. This means that the duration fetched becomes an estimate, but the significance of ignoring the first interval is small compared to the reduced complexity.
  • Add dynamic query vertical sharding
    • The goal of dynamic query splitting is to protect store-gateways by limiting the number of read requests sent per query, while providing enough parallelism for the query to execute efficiently
    • In some scenarios, a smaller vertical shard size allows more horizontal splitting while keeping store-gateways protected. This allows more efficient query execution.
    • The new change calculates the most optimal combination of vertical sharding and horizontal spliting to result in the largest number of total shards (without exceeding any of the configured limits)

Example query sum(rate(metric[1h])) by (pod) with 100 day range and the following configurations:

  • max_shards_per_query: 100
  • max_fetched_data_duration_per_query: 6000h // 250 day
  • query_vertical_shard_size: 3

Option one: 1 vertical shard (no vertical sharding)

  • 1 day split interval would result in 100 splits (100 shards) and 200 days total duration fetched

Option two: 2 vertical shards

  • 5 day split interval would result in 20 splits (40 shards) and 240 days total duration fetched

Option three: 3 vertical shards

  • 100 day split interval would result in no horizontal splitting (3 shards) and 303 days total duration fetched

Previously, we would always choose option three. After the change, in this scenario we would choose option one to not sharding vertically to allow 100 horizontal splits.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@afhassan afhassan changed the title [Query-Frontend] Refactor dynamic query splitting to reduce complexity [Query-Frontend] Add dynamic query vertical sharding Apr 3, 2025
@afhassan afhassan marked this pull request as ready for review April 10, 2025 01:53
expectedDurationFetchedBySelectors: 4 * day,
expectedDurationFetchedByLookbackDelta: 0 * day,
expectedDurationFetchedByRange: 40 * day,
expectedDurationFetchedBySelectors: 4 * day,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test about the edge case you described in the PR description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added separate test cases for dynamic splitting + vertical sharding scenarios. I also added an example in the PR description to explain the change better.

Signed-off-by: Ahmed Hassan <[email protected]>
Comment on lines +184 to +205
for candidateVerticalShardSize := 1; candidateVerticalShardSize <= maxVerticalShardSize; candidateVerticalShardSize++ {
maxSplitsFromMaxShards := getMaxSplitsFromMaxQueryShards(dynamicSplitCfg.MaxShardsPerQuery, candidateVerticalShardSize)
maxSplitsFromDurationFetched := getMaxSplitsFromDurationFetched(dynamicSplitCfg.MaxFetchedDataDurationPerQuery, candidateVerticalShardSize, queryExpr, r.GetStart(), r.GetEnd(), r.GetStep(), baseInterval, lookbackDelta)

// Use the more restrictive max splits limit
var maxSplits int
switch {
case dynamicSplitCfg.MaxShardsPerQuery > 0 && dynamicSplitCfg.MaxFetchedDataDurationPerQuery > 0:
maxSplits = min(maxSplitsFromMaxShards, maxSplitsFromDurationFetched)
case dynamicSplitCfg.MaxShardsPerQuery > 0:
maxSplits = maxSplitsFromMaxShards
case dynamicSplitCfg.MaxFetchedDataDurationPerQuery > 0:
maxSplits = maxSplitsFromDurationFetched
}

candidateInterval := getIntervalFromMaxSplits(r, baseInterval, maxSplits)
if candidateTotalShards := getExpectedTotalShards(r.GetStart(), r.GetEnd(), candidateInterval, candidateVerticalShardSize); candidateTotalShards > totalShards {
interval = candidateInterval
verticalShardSize = candidateVerticalShardSize
totalShards = candidateTotalShards
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change to introduce dynamic vertical sharding. We do the same calculations to determine the split interval, but now inside a loop that goes over vertical sharding from 1 to max.

The best vertical sharding value is the one that is expected to result in the most number of total shards, which is checked by getExpectedTotalShards().


// Set number of vertical shards to be used in shard_by middleware
if isShardable && maxVerticalShardSize > 1 {
ctx = tripperware.InjectVerticalShardSizeToContext(ctx, verticalShardSize)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chosen value for vertical sharding is injected to context here, and then used in shard_by.go middlware

Comment on lines -284 to -311
// Return the fixed base duration fetched by the query regardless of the number of splits, and the duration that is fetched once for every split
func getDurationFetchedByQuerySplitting(expr parser.Expr, queryStart int64, queryEnd int64, queryStep int64, baseInterval time.Duration, lookbackDelta time.Duration) (fixedDurationFetched time.Duration, perSplitDurationFetched time.Duration) {
// First analyze the query using original start-end time. Duration fetched by lookbackDelta here only reflects the start time of first split
durationFetchedByRange, durationFetchedBySelectors, durationFetchedByLookbackDeltaFirstSplit := analyzeDurationFetchedByQueryExpr(expr, queryStart, queryEnd, baseInterval, lookbackDelta)

fixedDurationFetched += durationFetchedByRange // Duration fetched by the query range is constant regardless of how many splits the query has
perSplitDurationFetched += durationFetchedBySelectors // Duration fetched by selectors is fetched once for every query split

// Next analyze the query using the next split start time to find the duration fetched by lookbackDelta for splits other than first one
nextIntervalStart := nextIntervalBoundary(queryStart, queryStep, baseInterval) + queryStep
_, _, durationFetchedByLookbackDeltaOtherSplits := analyzeDurationFetchedByQueryExpr(expr, nextIntervalStart, queryEnd, baseInterval, lookbackDelta)

// Handle different cases for lookbackDelta
if durationFetchedByLookbackDeltaFirstSplit > 0 && durationFetchedByLookbackDeltaOtherSplits > 0 {
// lookbackDelta is fetching additional duration for all splits
perSplitDurationFetched += durationFetchedByLookbackDeltaOtherSplits
} else if durationFetchedByLookbackDeltaOtherSplits > 0 {
// lookbackDelta is fetching additional duration for all splits except first one
perSplitDurationFetched += durationFetchedByLookbackDeltaOtherSplits
fixedDurationFetched -= durationFetchedByLookbackDeltaOtherSplits
} else if durationFetchedByLookbackDeltaFirstSplit > 0 {
// lookbackDelta is fetching additional duration for first split only
fixedDurationFetched += durationFetchedByLookbackDeltaFirstSplit
}

return fixedDurationFetched, perSplitDurationFetched
}

Copy link
Contributor Author

@afhassan afhassan Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify the calculation for expected duration fetched by query, we no longer calculate lookbackDelta separate from matrix selectors, which makes this function obsolete.

Comment on lines -344 to +327
// Adjust start and end time based on matrix selectors and/or subquery selector and increment total duration fetched, this excludes lookbackDelta
start, end := util.GetTimeRangesForSelector(queryStart, queryEnd, 0, n, path, evalRange)
startIntervalIndex := floorDiv(start, baseIntervalMillis)
endIntervalIndex := floorDiv(end, baseIntervalMillis)
totalDurationFetchedCount += int(endIntervalIndex-startIntervalIndex) + 1

// Increment duration fetched by lookbackDelta
startLookbackDelta := start - util.DurationMilliseconds(lookbackDelta)
startLookbackDeltaIntervalIndex := floorDiv(startLookbackDelta, baseIntervalMillis)
if evalRange == 0 && startLookbackDeltaIntervalIndex < startIntervalIndex {
durationFetchedByLookbackDeltaCount += int(startIntervalIndex - startLookbackDeltaIntervalIndex)
}
// Adjust start time based on matrix selectors and/or subquery selectors and calculate additional lookback duration fetched
start, end := util.GetTimeRangesForSelector(queryStart, queryEnd, lookbackDelta, n, path, evalRange)
durationFetchedBySelectors := (end - start) - (queryEnd - queryStart)
durationFetchedBySelectorsCount += int(ceilDiv(durationFetchedBySelectors, baseIntervalMillis))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also simplified because we no longer need to calculate duration fetched by lookbackDelta separately. It is all included in durationFetchedBySelectors.

Copy link
Contributor

@justinjung04 justinjung04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants