Skip to content

Improve HTTPRoute processing at scale #1122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kate-osborn opened this issue Oct 9, 2023 · 1 comment
Open

Improve HTTPRoute processing at scale #1122

kate-osborn opened this issue Oct 9, 2023 · 1 comment
Labels
area/performance Performance related backlog Currently unprioritized work. May change with user feedback or as the product progresses.

Comments

@kate-osborn
Copy link
Contributor

kate-osborn commented Oct 9, 2023

It takes ~7 hours to run the HTTPRoute scale test, which creates 1000 HTTPRoutes sequentially.

This test waits for the previously created HTTPRoute to be configured (available in NGINX) plus 2 seconds before creating the next HTTPRoute. Testing revealed that the longer we wait before creating the next HTTPRoute, the faster NGF processes the HTTPRoute. See this graph for more details.

Some contributing factors to the long processing times are:

This means if you have 99 HTTPRoutes configured and you create 1 more, NGF will update the configuration with this new route and then sequentially update the status of all 100 HTTPRoutes in the graph. Then, NGF will re-queue all 100 HTTPRoutes and process them again -- resulting in no configuration changes.

This situation can intensify if more HTTPRoutes are created while NGF is processing the last event batch or writing statuses. A new HTTPRoute can end up at the end of a large event batch that's full of no-op status changes.

Acceptance Criteria:

  • Investigate why the processing times for the HTTPRoute scale test are so long. The contributing factors listed above may not be the only factors.
  • Reduce the time it takes to process HTTPRoutes at scale. This can be measured by running the HTTPRoute scale test and comparing the results to the 1.0.0 results.
### Tasks
- [ ] https://github.com./nginxinc/nginx-gateway-fabric/issues/1013
- [ ] https://github.com./nginxinc/nginx-gateway-fabric/issues/825
- [ ] https://github.com./nginxinc/nginx-gateway-fabric/issues/1014
@kate-osborn kate-osborn mentioned this issue Oct 9, 2023
6 tasks
@brianehlert
Copy link

This looks to be pretty significant and have overall unintended consequences of slowing down change processing - or at least giving that appearance.
Customers expect configuration change to happen in a millisecond / second timeframe - no matter when those changes are submitted. And expect the status updates to align.

@ja20222 ja20222 added backlog Currently unprioritized work. May change with user feedback or as the product progresses. area/performance Performance related labels Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance related backlog Currently unprioritized work. May change with user feedback or as the product progresses.
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants