Skip to content

HDDS-14670. SCM should not finalize unless it is out of safemode#9974

Open
sodonnel wants to merge 1 commit intoapache:HDDS-14496-zdufrom
sodonnel:HDDS-14670-safemode
Open

HDDS-14670. SCM should not finalize unless it is out of safemode#9974
sodonnel wants to merge 1 commit intoapache:HDDS-14496-zdufrom
sodonnel:HDDS-14670-safemode

Conversation

@sodonnel
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This change just adds a safemode check in the FinalizationManagerImpl for SCM so that if an attempt to finalize from the CLI is made, it will error if safemode is enabled.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14670

How was this patch tested?

New unit tests added to validate the command fails with safemode on. Existing tests to ensure this does not break anything existing.

@github-actions github-actions bot added the zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496 label Mar 25, 2026
@errose28 errose28 self-requested a review March 26, 2026 13:15
Copy link
Copy Markdown
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, looks good to me! How come that this was not needed previously? Or was it just found now that it'd be useful to have a check for this?

@sodonnel
Copy link
Copy Markdown
Contributor Author

Thanks for working on this, looks good to me! How come that this was not needed previously? Or was it just found now that it'd be useful to have a check for this?

I think this was just missed before, but I am not sure. @errose28 probably has better context as he suggested we make this addition.

@errose28
Copy link
Copy Markdown
Contributor

Previously we did not need this because finalization blocked until at least one pipeline of three finalized datanodes was created, which could happen while SCM was in safemode.

The interim code we have on the branch is now blocking SCM until all datanodes are finalized. The problem is that SCM does not know how many datanodes there are until it exits safemode, so the condition will give different results depending when it is run. Looking at this again, once we remove that blocking code we probably don't need this. We only need to make sure we do not advertise to the OM that it should finalize while SCM is in safemode. SCM itself can finalize and send finalize commands to datanodes while in safemode.

Perhaps we should leave this as a draft or just close it for now, and revisit once the the blocking code is removed and we are looking at notifying OM to finalize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants