Skip to content

KEP-5732: Remove desiredCount from the TAS KEP#5949

Open
44past4 wants to merge 2 commits intokubernetes:masterfrom
44past4:remove-desired-count
Open

KEP-5732: Remove desiredCount from the TAS KEP#5949
44past4 wants to merge 2 commits intokubernetes:masterfrom
44past4:remove-desired-count

Conversation

@44past4
Copy link
Contributor

@44past4 44past4 commented Mar 9, 2026

  • One-line PR description: Removes adding desiredCount field to Basic and Gang PodGroup scheduling policies from the scope of TAS KEP.

  • Issue link: KEP-5732: Topology-aware workload scheduling #5732

  • Other comments:

    • After some discussions there is no clarity when it comes to expected semantics of the desiredCount field which was planned to be added to Basic and Gang scheduling policies on PodGroup. Because of this desiredCount will be removed from the alpha scope of the TAS KEP.

/sig scheduling

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Mar 9, 2026
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Scheduling Mar 9, 2026
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Mar 9, 2026
@k8s-ci-robot k8s-ci-robot requested review from dom4ha and macsko March 9, 2026 10:23
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: 44past4
Once this PR has been reviewed and has the lgtm label, please assign macsko for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 9, 2026
```go
// BasicSchedulingPolicy indicates that standard Kubernetes
// scheduling behavior should be used.
type BasicSchedulingPolicy struct {
Copy link
Contributor

@mm4tt mm4tt Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know that TAS for non-gangs is tricky due to edge cases (e.g., the scheduler making placement decisions based on a partial view of the pods). I treat this as an (unsuccessful) attempt to fix the problem. While removing it now makes sense until we have a more thorough design, I think we shouldn't just drop the context entirely.

Instead of simply removing it, should we add a short paragraph to the KEP outlining the current TAS limitations for basic policies, noting that this will be addressed in future releases? We could also mention scheduling gates as a currently available (albeit imperfect) mitigation.

The 'Phase 2' algorithm section might be a good place for this. We could expand on point '3. If all pods fit, the Placement is marked Feasible' to explain why this works perfectly for gangs (waiting for MinCount in pre-enqueue), but might behave inconsistently for basic policies (where the scheduler's decision depends on the number of observed pods).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of simply removing it, should we add a short paragraph to the KEP outlining the current TAS limitations for basic policies, noting that this will be addressed in future releases? We could also mention scheduling gates as a currently available (albeit imperfect) mitigation.

+1, but I think only the problem of "not-ready" pods is worth addressing as a part of TAS as it makes the feature working sub-optimally.

Addressing the desired future number of pods is rather an independent extension to TAS and IMHO deserves a separate KEP.

Copy link
Contributor Author

@44past4 44past4 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated Phase 2 algorithm description with a point explaining the limitations of using TAS with Basic Scheduling Policy. Please take a look.

@44past4 44past4 force-pushed the remove-desired-count branch from 7bf0cc3 to 54508a8 Compare March 9, 2026 10:36
Copy link
Member

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```go
// BasicSchedulingPolicy indicates that standard Kubernetes
// scheduling behavior should be used.
type BasicSchedulingPolicy struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of simply removing it, should we add a short paragraph to the KEP outlining the current TAS limitations for basic policies, noting that this will be addressed in future releases? We could also mention scheduling gates as a currently available (albeit imperfect) mitigation.

+1, but I think only the problem of "not-ready" pods is worth addressing as a part of TAS as it makes the feature working sub-optimally.

Addressing the desired future number of pods is rather an independent extension to TAS and IMHO deserves a separate KEP.

@dom4ha
Copy link
Member

dom4ha commented Mar 9, 2026

FTR, dropped PR with the discussion: kubernetes/kubernetes#137137

Add warning about using the Basic Scheduling Policy with TAS.
- **Potential Optimization:** Pre-filtering can check aggregate resources
requested by PodGroup Pods before running the full simulation.

- **Basic Scheduling Policy Handling:** The current algorithm may exhibit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies to Gang Policy if more pods than minCount are created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

4 participants