wsl: report a single "all" device to kubelet#1671
Open
elezar wants to merge 1 commit intoNVIDIA:mainfrom
Open
Conversation
9beb67b to
b55dfe1
Compare
On WSL, all GPUs are accessed through /dev/dxg. Replace the per-GPU wslDevice (which reported one device per physical GPU with individual UUIDs) with a stateless wslAllGPUsDevice that always returns UUID "all" and path "/dev/dxg". This causes the device map to collapse to a single entry per resource, so kubelet sees exactly one GPU device on WSL. When allocated, this flows naturally through all strategy paths (envvar, CDI, volume mounts) to set NVIDIA_VISIBLE_DEVICES=all, which is what nvidia-container-runtime on WSL expects. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Evan Lezar <elezar@nvidia.com>
b55dfe1 to
43b3086
Compare
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On WSL, there is no isolation across different GPUs on a system. This is because they are all accessed through the same
/dev/dgxdevice. This is reflected in in the CDI spec generated by the NVIDIA Container Toolkit to always generate a singlealldevice.This is incompatible with the device plugin when using a CDI-based device list strategy, since the device name reported by the plugin will include the device UUID or index.
The change in this PR ensures that the device plugin always reports a SINGLE device with a UUID and INDEX (
all) so that this is compatible with the generated CDI spec.