Skip to content

HDDS-14926. Allow QUASI_CLOSED containers in DiskBalancer with improved debug logs for containers#10022

Draft
Gargi-jais11 wants to merge 1 commit intoapache:masterfrom
Gargi-jais11:HDDS-14926
Draft

HDDS-14926. Allow QUASI_CLOSED containers in DiskBalancer with improved debug logs for containers#10022
Gargi-jais11 wants to merge 1 commit intoapache:masterfrom
Gargi-jais11:HDDS-14926

Conversation

@Gargi-jais11
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Currently DiskBalancer only supports CLOSED containers to be moved but if user wants to also move quasi closed containers then we should support that as well.

1st Improvement:
Below scenario can happen in real, so Disk balancer can only attempt those 10 CLOSED containers. Even if it moves all 10 successfully, disk A might drop only from 90% → 85% — nowhere near balance. The QUASI_CLOSED containers are the real bulk occupying the space and disk balancer is completely blind to them.

Disk A: 90% utilized 
├── 600 containers: QUASI_CLOSED 
└── 10 containers: CLOSED 
Disk B: 10% utilized 
Disk C: 11% utilized

Added a config hdds.datanode.disk.balancer.include.non.standard.containers default=false. If true, balancer include non-standard states, i.e, QUASI_CLOSED. So both CLOSED and QUASI_CLOSED state containers are eligible for move. If false (default), balancer only moves CLOSED containers.

2nd Improvement:
We need to add debug logs for chooseContainer method as user might not understand why if they have over and under utilised volume still container is not moved. This parts needs more clarification. Because I see in escalation it helped a lot with balancer debug logs for container not choose to identify what state the container or volume was in.
I suggest adding these logs for container not choose :

// UsedBytes less than 0
 LOG.debug("Skipping container {} from volume {}: bytes used is {}", containerId, src.getStorageDir().getPath(), containerData.getBytesUsed());
              
//  Skip containers already in progress
LOG.debug("Skipping container {} from volume {}: disk balancer move already in progress", containerId, src.getStorageDir().getPath());

// only closed and quasi closed containers should be moved
LOG.debug("Skipping container {} from volume {}: state is {}. Requires {}", containerId, src.getStorageDir().getPath(), containerData.getState(), phase.getEligibilityCriteria());

// skipping container move as its size is more than destination available space.
LOG.debug("Skipping container {} ({}B) from volume {}: exceeds destination {} usable space {}B, containerId, containerSize, src.getStorageDir().getPath(), dst.getStorageDir().getPath(), usableSpace);

// skipping container move as it will make dest more utilised after movement.
LOG.debug("Skipping container {} ({}B) from volume {}: moving to {} would  result in utilization {} exceeding upper threshold {}", containerId, containerSize, src.getStorageDir().getPath(), dst.getStorageDir().getPath(),

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14926

How was this patch tested?

Added unit test.

@Gargi-jais11 Gargi-jais11 changed the title HDDS-14926. Allow QUASI_CLOSED containers in DiskBalancer with improved debug logging for containers HDDS-14926. Allow QUASI_CLOSED containers in DiskBalancer with improved debug logs for containers Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant