WIP initial steps for distributing compaction coordination#6217
Draft
keith-turner wants to merge 13 commits intoapache:mainfrom
Draft
WIP initial steps for distributing compaction coordination#6217keith-turner wants to merge 13 commits intoapache:mainfrom
keith-turner wants to merge 13 commits intoapache:mainfrom
Conversation
There is still only one coordinator, but now the TGW and compactors both could talk to multiple coordinators.
The set of shutting down tservers was causing system fate operations to have to run on the primary manager because this was an in memory set. This caused fate to have different code paths to user vs system fate, this in turn caused problems when trying to distribute compaction coordination. To fix this problem moved the set from an in memory set to a set in zookeeper. The set is managed by fate operations which simplifies the existing code. Only fate operations add and remove from the set and fate keys are used to ensure only one fate operation runs at a time for a tserver instance. The previous in memory set had a lot of code to try to keep it in sync with reality, that is all gone now. There were many bugs with this code in the past. After this change is made fate can be simplified in a follow on commit to remove all specialization for the primary manager. Also the monitor can now directly access this set instead of making an RPC to the manager, will open a follow on issue for this.
After this change meta fate and user fate are both treated mostly the same in the managers. One difference is in assignment, the entire meta fate range is assigned to a single manager. User fate is spread across all managers. But both are assigned out by the primary manager using the same RPCs now. The primary manager used to directly start a meta fate instance. Was able to remove the extension of FateEnv from the manager class in this change, that caused a ripple of test changes. But now there are no longer two different implementations of FateEnv
Before this change a fate client was only available on the primary manager. Now fate clients are avaiable on all managers. The primary manager publishes fate assignment locations in zookeeper. These locations are used by managers to send notifications to other managers when they seed a fate operation.
Contributor
Author
|
After merging in the changes from #6232 a basic test of compaction coordination spread across multiple managers is now working. Was failing before. Still alot of loose ends and refactoring that is needed, but the basic functionality seems to be working now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Some initial steps for distributing compaction coordinator. Currently only contains the following.
Still need to do the following
Plan to continue experimenting with this.