[code not in mergable state yet] Add MI325X DeepSeek-R1 FP8 disaggregated inference with Broadcom Thor 2 IBGDA by JordanNanos · Pull Request #985 · SemiAnalysisAI/InferenceX

JordanNanos · 2026-03-31T17:12:37Z

Description

Port the MI355X DeepSeek-R1 FP8 disaggregated inference recipe to MI325X (gfx942/CDNA3) on a Vultr Slurm cluster with Broadcom BCM5760X Thor 2 NICs using IBGDA for GPU-Direct RDMA via MoRI.

Container image

ghcr.io/jordannanos/sgl-mi325x-mori:v0.5.9-bnxt-good

Built from a patched rocm.Dockerfile based on https://github.com/sgl-project/sglang. The Dockerfile and build scripts with all patches applied are published at:

https://github.com/JordanNanos/sglang (branch main, directory docker/)

Three patches were required to make the upstream Dockerfile build for MI325X + Broadcom IBGDA:

install_bcm_lib.sh — the upstream script used tar zxf on a .zip archive; fixed to detect archive format and use unzip for .zip files
smg-wasm pinned to =1.0.0 — upstream v0.5.9 ships without a Cargo.lock; smg-wasm 1.0.1 (published 2026-02-23) changed the WasmModuleManager API, breaking the sgl-model-gateway Rust build
MoRI commit updated to HEAD (c0eccaf2) — the previously pinned commit (2f88d06) requires system-installed infiniband/bnxt_re_dv.h / bnxt_re_hsi.h headers that the Broadcom BCM driver package does not ship; HEAD uses bundled headers + dlopen at runtime (commit ead84d86)

The Broadcom BCM5760X driver (bcm5760x_231.2.63.0a.zip) must be placed in the build context. Download from https://www.broadcom.com/support/download-search (search "BCM5760X" or "Thor 2", select the Linux OFED package matching your firmware version).

Build command:

Cluster hardware

Component	Details
GPUs	8x AMD Instinct MI325X (gfx942) per node
CPUs	2x AMD EPYC 9575F 64-Core
RDMA NICs	9x Broadcom BCM5760X Thor 2 (`bnxt_re`), 400Gbps RoCEv2, FW 231.2.63.0
Mgmt NICs	2x Mellanox ConnectX-6 Dx (`mlx5`), 100Gbps

Benchmark results — DeepSeek-R1-0528 FP8, ISL=1024 OSL=1024, 1P(TP4)+1D(TP8) = 12 GPUs

Metric	MI355X c=1	MI325X c=1	MI355X c=4	MI325X c=4
Output tok/s (total)	208.4	47.8	645.1	164.7
Output tok/s/gpu	17.37	3.98	53.76	13.72
Median TTFT (ms)	185.4	154.8	174.0	431.8
Median TPOT (ms)	7.00	20.79	8.72	23.68
Median ITL (ms)	6.99	20.77	8.73	23.68

MI325X throughput is ~4x lower than MI355X at the same GPU count, with ~3x higher decode latency. This is expected given:

CDNA3 vs CDNA4: MI355X has higher HBM bandwidth and compute
No RDMA QoS: nicctl is unavailable in the container, so MORI_RDMA_TC / MORI_RDMA_SL default to 0 (no PFC priority) — impacts throughput at higher concurrencies
Baseline MVP: first working disagg config on MI325X with Broadcom NICs; optimization is future work

Files changed

.github/configs/amd-master.yaml — add dsr1-fp8-mi325x-sglang-disagg config (mirrors MI355X bottom-of-curve: TP4p/TP8d, conc 1-64)
.github/configs/runners.yaml — add mi325x-disagg runner entry
benchmarks/multi_node/dsr1_fp8_mi325x_sglang-disagg.sh — new benchmark script
benchmarks/multi_node/amd_utils/env.sh — add chi-mi325x* hostname detection for Broadcom bnxt_re IB devices (skip bnxt_re6 which is DOWN)
benchmarks/multi_node/amd_utils/job.slurm — minor fixes for MI325X Docker device passthrough
benchmarks/multi_node/amd_utils/server.sh — add model config compat
runners/launch_mi325x-amd.sh — multi-node disagg launch support via sbatch+Docker
scripts/manual-test-mi325x.sh — manual test entry point

Related Issue

Fixes #981

Type of Change

Checklist

I have tested my changes locally
I have updated documentation if necessary
If I changed a container image or config, I have already updated perf-changelog.yaml

…or 2 IBGDA) Port the MI355X disagg recipe to MI325X (gfx942/CDNA3) on a Vultr Slurm cluster with Broadcom BCM5760X Thor 2 NICs using IBGDA for GPU-Direct RDMA via MoRI. Container image: ghcr.io/jordannanos/sgl-mi325x-mori:v0.5.9-bnxt Built from akao-amd/sglang rocm.Dockerfile with: - GPU_ARCH=gfx942, ENABLE_MORI=1, NIC_BACKEND=ibgda - Broadcom bnxt_rocelib (bcm5760x_231.2.63.0a) for RDMA userspace - MoRI pinned to HEAD (c0eccaf2) for bundled bnxt headers + dlopen - smg-wasm pinned to =1.0.0 (v1.0.1 breaks sgl-model-gateway v0.5.9 API) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-31T17:12:46Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

functionstackx

thanks for the PR. can u add sweep-enabled PR label or /sweep command to get ur PR into an mergable state such that u can merge ur first line of code into the main repo?

functionstackx · 2026-03-31T17:30:12Z

.github/configs/amd-master.yaml


+
+dsr1-fp8-mi325x-sglang-disagg:
+  image: ghcr.io/jordannanos/sgl-mi325x-mori:v0.5.9-bnxt


what is this built with? can u add the permalink to dockerfile and ur docker build commands?

its described in the PR description

can u add the exact git clone xyz what sglang hash

and the wget broadcom drivers

docker build

exct coomands for reproducible?

the build command is empty rn in ur PR description. can u fix? generally prefer that the build scripts be checked into the repo instead of PR descr

its not a wget, had to manually download from the broadcom site and then copy the tarball over to the cluster, based on exact firmware version installed on the cluster. happens for all thor2 NICs

the build command is included in the sbatch and/or the shell script: https://github.com/JordanNanos/sglang/blob/main/docker/build-sglang-bnxt.sh and https://github.com/JordanNanos/sglang/blob/main/docker/build-sglang-bnxt.sbatch

you want this build command in this repo?

yes, maybe in utils/? and add an readme too about how to manually download and tarball from broadcom?

that way we can get amd engineer to read ur dockerfile & build command and fix upstream builds

functionstackx · 2026-03-31T17:31:17Z

.github/configs/amd-master.yaml

+        - "DECODE_NODES=2"
+        - "DECODE_MTP_SIZE=0"
+
+    # "Low concurrency" (1 prefill worker at TP4, 1 decode worker at TP8)


r u sure that TP4 is on the pareto here? do u have an graph?

u only have TP4 curve and u have "hide non-optimal"? can u run the rest of the 24 datapoints?

.github/configs/amd-master.yaml

functionstackx · 2026-03-31T17:32:39Z

benchmarks/multi_node/amd_utils/job.slurm

@@ -357,7 +360,7 @@ exec sudo docker run --rm \
    --privileged \
    -v ${MODEL_DIR}:/models \
    -v \$HOME/.ssh:/root/.ssh \
-    -v $(which nicctl):/usr/sbin/nicctl \
+    $(command -v nicctl &>/dev/null && echo "-v $(which nicctl):/usr/sbin/nicctl") \


can u verify if these changes break mi355 disagg? +viz @Oseltamivir

the check for nicctl was breaking on this cluster, MoRI needs it to enforce QoS, disabled for now as it's not installed on these nodes or in the container built and seems unnecessary

functionstackx · 2026-03-31T17:45:44Z

.github/configs/amd-master.yaml

        - "DECODE_MTP_SIZE=1"

+
+dsr1-fp8-mi325x-sglang-disagg:


ur missing perfchange log . yaml too

- Add dsr1-fp8-mi325x-sglang-disagg-mtp config with MTP=1/2 across all curve points (top/middle/bottom/low-conc) for both 1k/1k and 8k/1k - Expand concurrency lists to cover full pareto frontier including non-optimal points - Update image tag to v0.5.9-bnxt-good (the pushed image) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JordanNanos · 2026-03-31T18:11:27Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-disagg dsr1-fp8-mi325x-sglang-disagg-mtp

github-actions · 2026-03-31T18:11:37Z

@JordanNanos Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23812520838
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-disagg dsr1-fp8-mi325x-sglang-disagg-mtp
Pinned ref: 2421ca5
Approval: not required (trusted collaborator).

JordanNanos · 2026-03-31T18:12:45Z

holy

JordanNanos · 2026-04-01T00:21:09Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-disagg dsr1-fp8-mi325x-sglang-disagg-mtp

github-actions · 2026-04-01T00:21:17Z

@JordanNanos Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23825672906
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-disagg dsr1-fp8-mi325x-sglang-disagg-mtp
Pinned ref: 5d0ae14
Approval: not required (trusted collaborator).

- Add chi-mi325x* hostname detection in env.sh for RDMA QoS config (MORI_RDMA_TC=104, MORI_RDMA_SL=3, derived from DCB DSCP AF31->prio 3) since nicctl is not available on Vultr/CPE MI325X hosts - Wrap sudo rm -rf calls with timeout 30s in launch_mi325x-amd.sh and job.slurm to prevent indefinite hangs on stale NFS locks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pre-stage model weights from NFS/shared storage to local NVMe before the inference server starts. Reduces model load time for large models (e.g., DeepSeek-R1 ~340GB FP8) from NFS read speeds to NVMe speeds. - utils/setup_local_nvme.sh: One-time NVMe setup script for compute nodes (format, mount, fstab entry). Supports single drive or RAID-0. - utils/cache_model_locally.sh: Standalone/sourceable model caching utility using rsync with parallel blob sync for HF hub cache layout. - job.slurm: When LOCAL_MODEL_CACHE_DIR is set, runs srun-based parallel rsync on all nodes before Docker starts. Idempotent (skips if cached). Falls back to shared storage if caching fails. - launch_mi325x-amd.sh: Enable local caching at /local-nvme/models for MI325X cluster (8x 3.5TB NVMe per node). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use rclone sync with --transfers 32 --checkers 32 --links for high-parallelism model pre-staging from NFS to local NVMe. rclone is now installed on all MI325X compute nodes (v1.73.3). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JordanNanos · 2026-04-01T06:56:41Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-disagg dsr1-fp8-mi325x-sglang-disagg-mtp

github-actions · 2026-04-01T06:56:52Z

@JordanNanos Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23836119825
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-disagg dsr1-fp8-mi325x-sglang-disagg-mtp
Pinned ref: db677bd
Approval: not required (trusted collaborator).

Add spec-decoding: mtp search space entries alongside the existing non-MTP entries for both 1k/1k and 8k/1k sequence length configs. This provides a single-node MTP baseline for comparison with the disaggregated multi-node MTP results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Separate dsr1-fp8-mi325x-sglang-mtp from the base config so it can be swept independently. Full sweeps still cover both via their respective config keys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JordanNanos · 2026-04-01T07:36:52Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-mtp

github-actions · 2026-04-01T07:37:03Z

@JordanNanos Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23837499645
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-mtp
Pinned ref: 67dec7c
Approval: not required (trusted collaborator).

JordanNanos · 2026-04-01T07:40:09Z

/sweep test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-mtp dsr1-fp8-mi325x-sglang-disagg dsr1-fp8-mi325x-sglang-disagg-mtp

github-actions · 2026-04-01T07:40:19Z

@JordanNanos Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23837617318
Command: test-config --config-files .github/configs/amd-master.yaml --config-keys dsr1-fp8-mi325x-sglang-mtp dsr1-fp8-mi325x-sglang-disagg dsr1-fp8-mi325x-sglang-disagg-mtp
Pinned ref: 67dec7c
Approval: not required (trusted collaborator).

cquil11 · 2026-04-01T14:53:20Z

@claude Run an e2e test-config on the relevant configs for this PR

Klaud-Cold · 2026-04-01T14:53:43Z

Claude finished @cquil11's task in 1m 1s —— View job

E2E test-config results for MI325X configs

Identify relevant config keys from PR changes
Trigger e2e test-config workflow
Monitor workflow run and report results

Config keys tested:

dsr1-fp8-mi325x-sglang-mtp (single-node)
dsr1-fp8-mi325x-sglang-disagg (multi-node, no MTP)
dsr1-fp8-mi325x-sglang-disagg-mtp (multi-node, with MTP)

Workflow run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23855060559

Results Summary

Overall: 1 success, 15 failures, 8 cancelled out of 24 benchmark jobs (+ 3 infra jobs succeeded)

Successful results — `dsr1-fp8-mi325x-sglang-disagg-mtp` P(TP4) D(TP8) 8k1k

7 concurrency datapoints collected from the one successful multi-node disagg-MTP job:

Conc	Out tok/s/gpu	Median TTFT (ms)	Median TPOT (ms)	Median ITL (ms)
1	9.55	634.5	12.68	27.75
2	17.23	663.4	12.30	29.85
4	32.01	664.0	14.81	31.55
8	54.99	664.9	16.89	35.26
16	92.25	957.2	19.62	40.94
32	138.85	3,327.3	24.16	49.86
64	185.88	10,566.9	27.78	58.07

Failure Analysis

1. `dsr1-fp8-mi325x-sglang-mtp` (single-node) — ALL FAILED (8/8 jobs)

Root cause: ERROR: dsr1_fp8_mi325x_sglang.sh not found in benchmarks/multi_node or benchmarks/single_node

The single-node MTP config references a benchmark script dsr1_fp8_mi325x_sglang.sh that does not exist in the repo. This PR only added dsr1_fp8_mi325x_sglang-disagg.sh for the multi-node disagg path. A single-node benchmark script needs to be created for the MTP config.

2. `dsr1-fp8-mi325x-sglang-disagg` / `disagg-mtp` (multi-node) — Most failed (7/16 failed, 8/16 cancelled, 1/16 success)

Root cause: ERROR: Job <id> failed before creating log file

These are Slurm job failures — the jobs failed before even creating a log file, suggesting infrastructure/runner issues (e.g., nodes unavailable, resource contention, Slurm scheduling failures). Only the P(tp4/ep1) D(tp8/ep1) spec-mtp 8k1k config on mi325x-amd_3 runner succeeded. The cancelled jobs were likely timed out waiting for runners.

Recommendations

Add missing single-node benchmark script — The dsr1-fp8-mi325x-sglang-mtp config needs a corresponding dsr1_fp8_mi325x_sglang.sh benchmark script in benchmarks/single_node/ (or the config's script field needs to point to an existing script).
Multi-node infra issues — The disagg job failures appear to be infrastructure-related (Slurm job failures before log creation). This may require runner availability investigation or a retry when the MI325X cluster is available.

| Branch: jordan/mi325x-disagg-bnxt

The launcher's script name pattern included _${FRAMEWORK} suffix, but single-node scripts don't use framework suffixes (only multi-node disagg scripts do). This broke all MI325X single-node configs. Fix by trying framework-suffixed name for multi-node first, then falling back to the base name for single-node. Also add MTP speculative decoding support to the existing dsr1_fp8_mi325x.sh script and update perf-changelog with the single-node config keys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SGLang's DP attention mode overrides chunked_prefill_size to 1024, which must be <= SGLANG_MORI_NUM_MAX_DISPATCH_TOKENS_PER_RANK. The default MORI_MAX_DISPATCH_TOKENS_DECODE of 160 is too small, causing an assertion failure on all EP8/DP decode configs (both MI325X and MI355X). Bump to 1024 when DP attention is enabled. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

functionstackx · 2026-04-02T07:14:11Z

benchmarks/single_node/dsr1_fp8_mi325x.sh

+MTP_ARGS=""
+CHAT_TEMPLATE_ARGS=""
+if [[ "${SPEC_DECODING:-}" == "mtp" ]]; then
+    MTP_ARGS="--speculative-algorithm NEXTN --speculative-eagle-topk 1 --speculative-num-steps 1 --speculative-num-draft-tokens 2"


probably do 3 tokens is better

functionstackx · 2026-04-02T07:17:40Z

.github/configs/amd-master.yaml

+        dp-attn: false
+        additional-settings:
+        - "DECODE_NODES=1"
+        - "DECODE_MTP_SIZE=2"


how much mtp to do is really an tradeoff between compute & memory BW. more mtp == more compute. i.e. for top left high concurrency you def want less MTP.

for low concurrency (bottom right of the curve), you def want more MTP.

for low concurrency, the SOL should be MTP=3

tho it depends on how optimized the AMD kernels acutally are lol

can u try DECODE_MTP_SIZE=3, there is an reasonable chance that it is better @JordanNanos

EP8/DP configs fail at runtime on MI325X (MoRI/RDMA issue with Broadcom Thor 2 NICs) — servers start but all requests fail. Comment out these search-space entries for now. Bump DECODE_MTP_SIZE from 2 to 3 and speculative-num-steps from 1 to 3 for better low-concurrency decode throughput on CDNA3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds P(tp4) → D(tp8/ep8/dp, 1 node) search-space entries for both non-MTP and MTP disagg configs. This isolates whether EP/DP itself is broken on MI325X or if only the multi-node distributed init hangs with Broadcom Thor 2 NICs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JordanNanos requested a review from a team March 31, 2026 17:12

JordanNanos requested review from billishyahao and chunfangamd as code owners March 31, 2026 17:12

github-project-automation bot added this to InferenceMAX Board Mar 31, 2026

functionstackx requested changes Mar 31, 2026

View reviewed changes

functionstackx reviewed Mar 31, 2026

View reviewed changes

.github/configs/amd-master.yaml Show resolved Hide resolved

functionstackx reviewed Mar 31, 2026

View reviewed changes

Update amd-master.yaml

7b50476

functionstackx reviewed Mar 31, 2026

View reviewed changes

functionstackx marked this pull request as draft March 31, 2026 17:45

functionstackx changed the title ~~[AMD] Add MI325X DeepSeek-R1 FP8 disaggregated inference with Broadcom Thor 2 IBGDA~~ [code not in mergable state yet] Add MI325X DeepSeek-R1 FP8 disaggregated inference with Broadcom Thor 2 IBGDA Mar 31, 2026

root and others added 2 commits March 31, 2026 17:51

Add perf-changelog entry for MI325X disagg configs

2421ca5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SemiAnalysisAI deleted a comment from github-actions bot Mar 31, 2026

JordanNanos force-pushed the jordan/mi325x-disagg-bnxt branch from 5d0ae14 to 6abdf85 Compare April 1, 2026 04:14

JordanNanos and others added 2 commits April 1, 2026 06:44

JordanNanos and others added 2 commits April 1, 2026 07:30

Split MI325X single-node MTP into separate config key

67dec7c

Separate dsr1-fp8-mi325x-sglang-mtp from the base config so it can be swept independently. Full sweeps still cover both via their respective config keys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SemiAnalysisAI deleted a comment from JordanNanos Apr 1, 2026

JordanNanos mentioned this pull request Apr 2, 2026

Add MI325X disagg + MTP hardware configs for dashboard SemiAnalysisAI/InferenceX-app#143

Merged

4 tasks

functionstackx reviewed Apr 2, 2026

View reviewed changes

JordanNanos and others added 2 commits April 2, 2026 18:15



		dsr1-fp8-mi325x-sglang-disagg:
		image: ghcr.io/jordannanos/sgl-mi325x-mori:v0.5.9-bnxt

Conversation

JordanNanos commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Container image

Cluster hardware

Benchmark results — DeepSeek-R1-0528 FP8, ISL=1024 OSL=1024, 1P(TP4)+1D(TP8) = 12 GPUs

Files changed

Related Issue

Type of Change

Checklist

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JordanNanos commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

JordanNanos commented Mar 31, 2026

Uh oh!

JordanNanos commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

JordanNanos commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

JordanNanos commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

JordanNanos commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

cquil11 commented Apr 1, 2026

Uh oh!

Klaud-Cold commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E test-config results for MI325X configs

Results Summary

Successful results — dsr1-fp8-mi325x-sglang-disagg-mtp P(TP4) D(TP8) 8k1k

Failure Analysis

1. dsr1-fp8-mi325x-sglang-mtp (single-node) — ALL FAILED (8/8 jobs)

2. dsr1-fp8-mi325x-sglang-disagg / disagg-mtp (multi-node) — Most failed (7/16 failed, 8/16 cancelled, 1/16 success)

Recommendations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JordanNanos commented Mar 31, 2026 •

edited

Loading

Klaud-Cold commented Apr 1, 2026 •

edited

Loading

Successful results — `dsr1-fp8-mi325x-sglang-disagg-mtp` P(TP4) D(TP8) 8k1k

1. `dsr1-fp8-mi325x-sglang-mtp` (single-node) — ALL FAILED (8/8 jobs)

2. `dsr1-fp8-mi325x-sglang-disagg` / `disagg-mtp` (multi-node) — Most failed (7/16 failed, 8/16 cancelled, 1/16 success)