Skip to content

HDDS-14921. Improve space accounting in SCM with In-Flight container allocation tracking.#10000

Draft
ashishkumar50 wants to merge 7 commits intoapache:masterfrom
ashishkumar50:HDDS-14921
Draft

HDDS-14921. Improve space accounting in SCM with In-Flight container allocation tracking.#10000
ashishkumar50 wants to merge 7 commits intoapache:masterfrom
ashishkumar50:HDDS-14921

Conversation

@ashishkumar50
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Maintain space accounting during container allocation in SCM. More detail description is in Jira.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14921

How was this patch tested?

UT and IT. (More test scenario TBD)

@ashishkumar50 ashishkumar50 marked this pull request as draft March 30, 2026 04:24
Copy link
Copy Markdown
Contributor

@rakeshadr rakeshadr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ashishkumar50 for providing the patch. Added a few comments, please take care.

if (!alreadyOnDn && getContainerManager() instanceof ContainerManagerImpl) {
((ContainerManagerImpl) getContainerManager())
.getPendingContainerTracker()
.removePendingAllocation(dd, id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say, DN is healthy, all containers confirmed, no new allocations → that DN's bucket never rolls even though heartbeats come every 30 seconds, right?

t=0    Container C1 allocated → pending recorded in tracker

t=60-120  FCR arrives from DN
          → cid = C1
          → alreadyInDn = expectedContainersInDatanode.remove(C1) = FALSE
          → !alreadyInDn = TRUE → removePendingAllocation called → rollIfNeeded fires ✓
          → C1 added to NM DN-set

How abt rolls on every processHeartbeat, every 30 seconds regardless of container state changes ?

}

// Cleanup empty buckets to prevent memory leak
if (bucket.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially hits concurrency issue. Say two threads entered this block.

Thread-1 (removePendingAllocation): bucket.isEmpty(), returns true

Thread-2 (recordPendingAllocationForDatanode): computeIfAbsent(uuid) returns same bucket
reference (key still exists), calls bucket.add(containerID) and now the bucket will be non-empty

Thread-1: datanodeBuckets.remove(uuid, bucket), then removes the non-empty bucket and now the containerID will be in a detached bucket object, right?

I think, we need to add synchronization to avoid detached bucket object.

}

@Test
public void testRemoveFromBothWindows() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have test scenario covering roll over?

The two-window rolling behavior (container in previousWindow roll after 2× interval). Say, add C1 in currentWindow, then moves C1 to previousWindow, then wait for the roll over.

Copy link
Copy Markdown
Contributor

@sumitagrawl sumitagrawl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashishkumar50 Thanks for working over this, have few review comments.

)
private int transactionToDNsCommitMapLimit = 5000000;

@Config(key = "hdds.scm.container.pending-allocation.roll-interval",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can have config as hdds.scm.container.pending.allocation.roll.interval

* @param pipeline The pipeline where container is allocated
* @param containerID The container being allocated
*/
public void recordPendingAllocation(Pipeline pipeline, ContainerID containerID) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be part of SCMNodeManager, more specific to SCMNodeStat. Reason,

  1. need handle even like stale node / dead node handler as cleanup
  2. May need report this when reporting to CLI for available space in the DN
  3. To be used for pipeline allocation policy, where container manager does not come in role
    Its datanode space, just trying to identify already allocated space. And needs to be part of committed space at SCM when reporting to CLI, or other breakup.

final ContainerInfo container;
try {
// Check if container is already known to this DN before adding
boolean alreadyOnDn = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need check if container exist? or just remove if exist as single call ?

processContainerReplica(dd, container, replicaProto, publisher, detailsForLogging);

// Remove from pending tracker when container is added to DN
if (!alreadyOnDn && getContainerManager() instanceof ContainerManagerImpl) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if node report is also send in ICR, this is for reason that node information should be updated with ICR at same time.

// (1*5GB) + (2*5GB) = 15GB → actually 3 containers
long totalCapacity = 0L;
long effectiveAllocatableSpace = 0L;
for (StorageReportProto report : storageReports) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calcuating all available and then removing, we can do progressive base, like,
required=pending+newAllocation
for each report
required = required - volumeUsage in roundoff value
if (required <= 0)
return true

But we need to reserve also, can do first add and check, if not present, remove containerId

OR other way,
when DN report storage handling, total consolidate value can also be added to memory to avoid looping on every call.

bucket.rollIfNeeded();

// Each pending container assumes max size
return (long) bucket.getCount() * maxContainerSize;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is costly operation as first it combine 2 list, and then it performs count.

* Contains current and previous window sets, plus last roll timestamp.
*/
private static class TwoWindowBucket {
private Set<ContainerID> currentWindow = ConcurrentHashMap.newKeySet();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we use synchronized in all methods, then set need not be threadsafe.

"After 2x this interval, allocations that haven't been confirmed via " +
"container reports will automatically age out. Default is 10 minutes."
)
private Duration pendingContainerAllocationRollInterval = Duration.ofMinutes(10);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rolling period is 5 min or 10 min ? mean previous bucket to be there for additional 5 min to capture containerList IN-Progress

// Remove from pending tracker when container is added to DN
// This container was just confirmed for the first time on this DN
// No need to remove on subsequent reports (it's already been removed)
if (container != null && getContainerManager() instanceof ContainerManagerImpl) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code handling different from ICR and FCR, can be same only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants