Feature Name | Tracking Issue | Date | Author |
---|---|---|---|
Region Migration Procedure |
2023-11-03 |
Xu Wenkang <[email protected]> |
This RFC proposes a way that brings the ability of Meta Server to move regions between the Datanodes.
Typically, We need this ability in the following scenarios:
- Migrate hot-spot Regions to idle Datanode
- Move the failure Regions to an available Datanode
flowchart TD
style Start fill:#85CB90,color:#fff
style End fill:#85CB90,color:#fff
style SelectCandidate fill:#F38488,color:#fff
style OpenCandidate fill:#F38488,color:#fff
style UpdateMetadataDown fill:#F38488,color:#fff
style UpdateMetadataUp fill:#F38488,color:#fff
style UpdateMetadataRollback fill:#F38488,color:#fff
style DowngradeLeader fill:#F38488,color:#fff
style UpgradeCandidate fill:#F38488,color:#fff
Start[Start]
SelectCandidate[Select Candidate]
UpdateMetadataDown["`Update Metadata(Down)
1. Downgrade Leader
`"]
DowngradeLeader["`Downgrade Leader
1. Become Follower
2. Return **last_entry_id**
`"]
UpgradeCandidate["`Upgrade Candidate
1. Replay to **last_entry_id**
2. Become Leader
`"]
UpdateMetadataUp["`Update Metadata(Up)
1. Switch Leader
2.1. Remove Old Leader(Opt.)
2.2. Move Old Leader to Follower(Opt.)
`"]
UpdateMetadataRollback["`Update Metadata(Rollback)
1. Upgrade old Leader
`"]
End
AnyCandidate{Available?}
OpenCandidate["Open Candidate"]
CloseOldLeader["Close Old Leader"]
Start
--> SelectCandidate
--> AnyCandidate
--> |Yes| UpdateMetadataDown
--> I1["Invalid Frontend Cache"]
--> DowngradeLeader
--> UpgradeCandidate
--> UpdateMetadataUp
--> I2["Invalid Frontend Cache"]
--> End
UpgradeCandidate
--> UpdateMetadataRollback
--> I3["Invalid Frontend Cache"]
--> End
I2
--> CloseOldLeader
--> End
AnyCandidate
--> |No| OpenCandidate
--> UpdateMetadataDown
Only the red nodes will persist state after it has succeeded, and other nodes won't persist state. (excluding the Start and End nodes).
The persistent context: It's shared in each step and available after recovering. It will only be updated/stored after the Red node has succeeded.
Values:
region_id
: The target leader region.peer
: The target datanode.close_old_leader
: Indicates whether close the region.leader_may_unreachable
: It's used to support the failover procedure.
The Volatile context: It's shared in each step and available in executing (including retrying). It will be dropped if the procedure runner crashes.
The Persistent state: Selected Candidate Region.
The Persistent context:
- The (latest/updated)
version
ofTableRouteValue
, It will be used in the step ofUpdate Metadata(Up)
.
This step sends an instruction via heartbeat and performs:
- Downgrades leader region.
- Retrieves the
last_entry_id
(if available).
If the target leader region is not found:
- Sets
close_old_leader
to true. - Sets
leader_may_unreachable
to true.
If the target Datanode is unreachable:
- Waits for region lease expired.
- Sets
close_old_leader
to true. - Sets
leader_may_unreachable
to true.
The Persistent context: None
The Persistent state:
last_entry_id
*Passes to next step.
This step sends an instruction via heartbeat and performs:
- Replays the WAL to latest(
last_entry_id
). - Upgrades the candidate region.
If the target region is not found:
- Rollbacks.
- Notifies the failover detector if
leader_may_unreachable
== true. - Exits procedure.
If the target Datanode is unreachable:
- Rollbacks.
- Notifies the failover detector if
leader_may_unreachable
== true. - Exits procedure.
The Persistent context: None
This step performs
- Switches Leader.
- Removes Old Leader(Opt.).
- Moves Old Leader to follower(Opt.).
The TableRouteValue
version should equal the TableRouteValue
's version
in Persistent context. Otherwise, verifies whether TableRouteValue
already updated.
The Persistent context: None
This step sends a close region instruction via heartbeat.
If the target leader region is not found:
- Ignore.
If the target Datanode is unreachable:
- Ignore.
This step sends an open region instruction via heartbeat and waits for conditions to be met (typically, the condition is that the last_entry_id
of the Candidate Region is very close to that of the Leader Region or the latest).
If the target Datanode is unreachable:
- Exits procedure.