HDDS-14868. Avoid full scan of container list during refreshAndValidate of ContainerSafemodeRule. by sadanand48 · Pull Request #9953 · apache/ozone

sadanand48 · 2026-03-20T10:35:29Z

What changes were proposed in this pull request?

Periodic refresh — Run refresh on a ~5s (configurable) schedule instead of on every applyTransaction / refresh(false) path.

https://issues.apache.org/jira/browse/HDDS-14868

…te of ContainerSafemodeRule.

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java

szetszwo · 2026-03-26T11:17:37Z

@sadanand48 , thanks for working on this!

How about refreshing the safemode rules every 5s, instead of doing it in applyTransactions?

sadanand48 · 2026-03-26T16:55:45Z

How about refreshing the safemode rules every 5s, instead of doing it in applyTransactions?

Thanks @szetszwo for the input, we could make this behaviour configurable i.e periodic or based on applyTransaction. I'm saying because smaller clusters or cluster's without any pending logs may be impacted by redundant refresh calls.

szetszwo · 2026-03-26T17:22:32Z

... smaller clusters or cluster's without any pending logs may be impacted by redundant refresh calls.

Refreshing the safemode rules in applyTransaction actually is a big mistake -- applyTransaction is the critical path of the StateMachine, adding unnecessary operations there is going to slow down everything.

In contrast, refreshing the safemode rules every 5s is not going to have any measurable performance impact. Hypothetically, if refreshing every 5s is not okay, then refreshing it applyTransaction is definitely much worse since there are thousands of applyTransaction ops per second.

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java

szetszwo

@sadanand48 , thanks for the update

Since the current code in SCMStateMachine use SCMSafeModeManager to refresh, it is better to do refresh in SCMSafeModeManager.
When refresh is enabled, SCMStateMachine should not refresh.
During refreshing, if it is NOT in safemode, we can stop the executor. Then, we don't need any stop method.
It is better to create a non-mock test using MiniOzoneCluster.

See https://issues.apache.org/jira/secure/attachment/13081501/9953_review.patch

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java

sadanand48 · 2026-03-31T13:02:52Z

Thanks @szetszwo for the review, updated as per your patch

it is better to do refresh in SCMSafeModeManager.

With this, all the safemode rules will have the same behaviour, I guess that should be okay. I will add a non-mock test

szetszwo

@sadanand48 , thanks for the update!

Quick question:

Would it work if we don't make the changes in AbstractContainerSafeModeRule and other code logic changes such as isScmRatisApplyCaughtUpToCommit?

If it works, this PR should only change the refreshing time (i.e. periodic refreshing instead of doing it in SCMStateMachine.) Other code logic changes/improvement can be done in a separate PR.

sadanand48 added 2 commits March 20, 2026 14:40

HDDS-14868. Avoid full scan of container list during refreshAndValida…

30f4081

…te of ContainerSafemodeRule.

fix 1

30290d6

sumitagrawl reviewed Mar 25, 2026

View reviewed changes

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

refresh every 5s

2a9aadd

sadanand48 requested a review from szetszwo March 27, 2026 07:58

szetszwo reviewed Mar 28, 2026

View reviewed changes

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

sadanand48 added 3 commits March 30, 2026 14:48

refresh only when transactions pending

d74022e

revert new map additions

66163f1

compile

efdefc9

szetszwo reviewed Mar 30, 2026

View reviewed changes

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

...ver-scm/src/main/java/org/apache/hadoop/hdds/scm/safemode/AbstractContainerSafeModeRule.java Outdated Show resolved Hide resolved

sadanand48 added 2 commits March 31, 2026 18:23

address comments

34603b8

code cleanup

e2bfca4

szetszwo reviewed Mar 31, 2026

View reviewed changes

sadanand48 added 3 commits April 1, 2026 12:03

checkstyle

cafe82a

add to default xml

7e9a8f7

add tests

c3fb5a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-14868. Avoid full scan of container list during refreshAndValidate of ContainerSafemodeRule.#9953

HDDS-14868. Avoid full scan of container list during refreshAndValidate of ContainerSafemodeRule.#9953
sadanand48 wants to merge 11 commits intoapache:masterfrom
sadanand48:HDDS-14868

sadanand48 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

szetszwo commented Mar 26, 2026

Uh oh!

sadanand48 commented Mar 26, 2026 •

edited

Loading

Uh oh!

szetszwo commented Mar 26, 2026

Uh oh!

Uh oh!

szetszwo left a comment

Uh oh!

Uh oh!

Uh oh!

sadanand48 commented Mar 31, 2026

Uh oh!

szetszwo left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sadanand48 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Uh oh!

Uh oh!

Uh oh!

szetszwo commented Mar 26, 2026

Uh oh!

sadanand48 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

szetszwo commented Mar 26, 2026

Uh oh!

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sadanand48 commented Mar 31, 2026

Uh oh!

szetszwo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sadanand48 commented Mar 20, 2026 •

edited

Loading

sadanand48 commented Mar 26, 2026 •

edited

Loading