Skip to content

Release cluster state bytes earlier in PublicationTransportHandler #127054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

original-brownbear
Copy link
Member

The cluster state publication can be quite large, especially when initially joining a cluster. By not leaving the forking to the transport layer we can release the network buffer before actually processing it which can save up to O(100M) of heap in some situations and improves stability quite a bit for situations where small nodes join clusters with large existing states.

PS: I admit this looks a little hacky, but we can create a nicer primitive for ensuring a single dec-ref under all failure conditions separately IMO. We have a couple similar looking spots that could benefit from a cleanup as well.

See the following screenshot, with recent transport layer fixes this does indeed work to free the actual underlying Netty buffer now:

image

The cluster state publication can be quite large, especially when initially joining a cluster.
By not leaving the forking to the transport layer we can release the network buffer before actually
processing it which can save up to O(100M) of heap in some situations and improves stability quite a bit
for situations where small nodes join clusters with large existing states.
@original-brownbear original-brownbear added >non-issue :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.19.0 v9.1.0 labels Apr 18, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Apr 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@original-brownbear
Copy link
Member Author

Hmm fixing that one failing test (testIncludesLastCommittedFieldsInDiffSerialization) is a a little tricky, we do play some tricks there with the way the transport request is handled that break once we move the forking into the handler itself. Not entirely sure how to fix this yet while preserving what we are actually trying to test. Any tipps are welcome, but hopefully I'll be able to figure this out eventually still :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants