Add MIG profile support for ml.p6-b300.48xlarge (Blackwell Ultra)#398
Open
Add MIG profile support for ml.p6-b300.48xlarge (Blackwell Ultra)#398
Conversation
Add ml.p6-b300.48xlarge to INSTANCE_TYPE_MIG_PROFILES in constants.py with the correct B300 MIG profiles derived from the NVIDIA GPU Operator v25.3.0 upstream ConfigMap (device-filter 0x318210DE): - mig-1g.34gb, mig-1g.67gb, mig-2g.67gb - mig-3g.135gb, mig-4g.135gb, mig-7g.269gb Also add the corresponding uniform and mixed MIG partition profiles to the Helm chart default-mig-config.yaml ConfigMap, following the same pattern used for existing GPU types (H100, H200, B200). The B300 GPU (288GB HBM3e, ~269GB usable) was already registered in INSTANCE_RESOURCES but had no MIG profile mapping, causing HyperPod MIG validation to reject accelerator partition requests on this instance type.
045470a to
c98fd6e
Compare
KeitaW
added a commit
to KeitaW/sagemaker-hyperpod-cli
that referenced
this pull request
Mar 28, 2026
Covers ml.p6-b300.48xlarge MIG profile support added in PR aws#398: - Profile presence in INSTANCE_TYPE_MIG_PROFILES - Complete profile list verification (6 profiles) - All profiles in ALLOWED_ACCELERATOR_PARTITION_TYPES - GPU slice extraction for all B300 profiles (1g→1, 2g→2, ..., 7g→7) - CPU/memory default calculation for each profile at max instances - Validation acceptance for valid B300 profiles - Validation rejection for invalid profiles on B300 instance type
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ml.p6-b300.48xlargetoINSTANCE_TYPE_MIG_PROFILESinconstants.pywith the B300 MIG profiles:mig-1g.34gb,mig-1g.67gb,mig-2g.67gb,mig-3g.135gb,mig-4g.135gb,mig-7g.269gbdefault-mig-config.yamlConfigMapRelationship to #396
PR #396 ("Added profiles for B300") was merged on 2026-03-23 and added 2 ConfigMap profiles (
all-1g.67gbandmixed-2-1g.34gb-1-2g.67gb-1-3g.135gb). However, it left two critical gaps:1.
constants.pywas not updated — MIG requests on B300 are rejected before the ConfigMap is ever consulted._validate_accelerator_partition_parameters()inaccelerator_partition_util.pychecksINSTANCE_TYPE_MIG_PROFILESat line 26 as a gate. Becauseml.p6-b300.48xlargeis absent from that dict, the CLI returns:This blocks all MIG usage on B300 —
HyperPodPyTorchJobsubmissions, inference endpoints withacceleratorPartitionType, andhyp list-accelerator-partition-type. The ConfigMap profiles from #396 are unreachable.2. 15 of 17 ConfigMap profiles are missing.
Cross-referencing against the NVIDIA GPU Operator v25.3.0 upstream ConfigMap (B300 section, device-filter
0x318210DE) and the NVIDIA MIG product page (Blackwell Ultra: 7x34GB, 4x69GB, 2x139GB, 1x279GB):all-1g.34gb(x7)all-1g.67gb(x4)all-2g.67gb(x3)all-3g.135gb(x2)all-4g.135gb(x1)all-7g.269gb(x1)Profile Source
MIG profiles are derived from the NVIDIA GPU Operator upstream ConfigMap (v25.3.0), which defines B300 profiles under the
# B300comment section withall-balanceddevice-filter0x318210DE. The NVIDIA MIG User Guide (r580) has not been updated for B300 yet.Additional Note
The existing
p6-b200.48xlargekey inINSTANCE_TYPE_MIG_PROFILESis missing theml.prefix (unlike all other entries). This PR does not address that issue to keep scope focused, but it may warrant a separate fix.Test plan
INSTANCE_TYPE_MIG_PROFILES['ml.p6-b300.48xlarge']returns the correct 6 profilesALLOWED_ACCELERATOR_PARTITION_TYPESincludes all B300 MIG types (mig-1g.34gb,mig-1g.67gb,mig-2g.67gb,mig-3g.135gb,mig-4g.135gb,mig-7g.269gb)default-mig-config.yamlparses as valid YAML_validate_accelerator_partition("mig-1g.34gb", ..., "ml.p6-b300.48xlarge")passes validationml.p6-b300.48xlargeandnvidia.com/mig.config: all-1g.34gb