[WIP] Minimaxm2.5 nvfp4 b200 by Ankur-singh · Pull Request #996 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-04-02T20:20:28Z

Summary

Add MiniMax-M2.5 NVFP4 benchmark configuration and script for B200 GPUs using vLLM.

Changes

Benchmark config (.github/configs/nvidia-master.yaml): Added minimaxm2.5-fp4-b200-vllm config using nvidia/MiniMax-M2.5-NVFP4 model with vLLM nightly image (vllm/vllm-openai:nightly-5b8c30d62b754b575e043ce2fc0dcbf8a64f6306). Supports TP=2 and TP=4 with concurrency range 4–256 at 1k/1k and 8k/1k sequence lengths.
Benchmark script (benchmarks/single_node/minimaxm2.5_fp4_b200.sh): New vLLM serve script with FP8 KV cache, expert parallelism support, and VLLM_FLASHINFER_ALLREDUCE_BACKEND=mnnvl.
Perf changelog (perf-changelog.yaml): Added entry for the new config.

Test Plan

Run e2e benchmarks for minimaxm2.5-fp4-b200-vllm on B200 to validate config

github-actions · 2026-04-02T20:20:41Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Ankur-singh · 2026-04-02T20:21:13Z

@claude update PR description and fix PR place holder in perf-changelog.yaml file

Klaud-Cold · 2026-04-02T20:21:43Z

Claude finished @Ankur-singh's task in 1m 7s —— View job

Task: Update PR description and fix perf-changelog.yaml

Gather context from changed files
Update PR description
Fix PR placeholder in perf-changelog.yaml (pull/xxx → pull/996)
Commit and push changes

Done

Updated PR description with a summary of changes (config, benchmark script, perf changelog) and a test plan.
Fixed perf-changelog.yaml — replaced placeholder pr-link: .../pull/xxx with pr-link: .../pull/996 and pushed commit 31510da.
·

Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>

claude · 2026-04-02T20:32:01Z

benchmarks/single_node/minimaxm2.5_fp4_b200.sh

+--max-model-len $MAX_MODEL_LEN \
+--block-size=32 \
+--kv-cache-dtype fp8 \
+--stream-interval 20 --no-enable-prefix-caching \


🔴 The script passes --stream-interval 20 to vllm serve, which is a SGLang-specific argument not recognized by vLLM; this will cause the server to exit immediately with an 'unrecognized arguments: --stream-interval' error, preventing the benchmark from running at all. Remove --stream-interval 20 from the vllm serve invocation — the companion FP8 script (minimaxm2.5_fp8_b200.sh) omits this flag and serves as the correct reference.

Extended reasoning...

Bug: --stream-interval is a SGLang-only argument passed to vllm serve

What the bug is and how it manifests

Line 50 of benchmarks/single_node/minimaxm2.5_fp4_b200.sh passes --stream-interval 20 as part of the vllm serve command. This flag is a SGLang server parameter (it controls how often SGLang flushes streamed tokens) and is not part of the vLLM CLI argument set. When vLLM receives an unrecognized argument it calls argparse's standard error path, prints 'unrecognized arguments: --stream-interval', and exits with a non-zero status before the server ever starts.

The specific code path

The relevant section of the script is:

vllm serve $MODEL --port $PORT --tensor-parallel-size=$TP $EP --gpu-memory-utilization 0.95 --max-model-len $MAX_MODEL_LEN --block-size=32 --kv-cache-dtype fp8 --stream-interval 20 --no-enable-prefix-caching <-- line 50: invalid for vLLM --trust-remote-code > $SERVER_LOG 2>&1 &

After vllm serve exits, wait_for_server_ready will poll until the timeout expires (or detect the dead PID), and the benchmark job fails without producing any results.

Why existing code does not prevent it

No argument validation is performed in benchmark_lib.sh or any wrapper script — arguments are passed verbatim to the underlying framework binary. The flag was almost certainly copy-pasted from a SGLang benchmark script. Every other occurrence of --stream-interval in the repo (dsr1_fp4_b200.sh, dsr1_fp8_b200.sh, glm5_nvfp4_b200.sh, qwen3.5_bf16_b200.sh, etc.) is in a script that launches python3 -m sglang.launch_server, not vllm serve. A targeted search for 'vllm serve.*stream-interval' returns zero results across the entire benchmarks/ directory.

Step-by-step proof

A CI job picks up minimaxm2.5-fp4-b200-vllm from nvidia-master.yaml.

The harness executes minimaxm2.5_fp4_b200.sh inside the vLLM nightly container.

The shell reaches the vllm serve block and starts the server process.

vLLM parses its CLI arguments; --stream-interval is not in vllm serve's argparse namespace.

vLLM prints: 'error: unrecognized arguments: --stream-interval' and exits non-zero.

wait_for_server_ready detects that SERVER_PID died (or times out polling the health endpoint) and terminates the script with an error — no benchmark results are produced.

How to fix it

Simply remove --stream-interval 20 from the vllm serve invocation on line 50. The direct reference implementation benchmarks/single_node/minimaxm2.5_fp8_b200.sh (the FP8 B200 companion) uses vllm serve without this flag and is the correct pattern to follow.

claude · 2026-04-02T20:32:01Z

perf-changelog.yaml

+  description:
+    - "MiniMax-M2.5 NVFP4 benchmark config for B200"
+    - "Uses nvidia/MiniMax-M2.5-NVFP4 model checkpoint"
+    - "Image: vllm/vllm-openai:nightly-5b8c30d62b754b575e043ce2fc0dcbf8a64f6306"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/996


🟡 The new perf-changelog.yaml entry for minimaxm2.5-fp4-b200-vllm has pr-link: .../pull/xxx — a lowercase placeholder that was never updated to the actual PR number (996). After merge, the changelog link will point to a non-existent URL.

Extended reasoning...

What the bug is: The changelog entry added by this PR ends with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/xxx. The lowercase xxx is clearly a placeholder that was never replaced with the actual PR number before submission.

The specific code path: In perf-changelog.yaml at lines 1241-1245, the newly added entry for minimaxm2.5-fp4-b200-vllm reads:

- config-keys: - minimaxm2.5-fp4-b200-vllm description: - "MiniMax-M2.5 NVFP4 benchmark config for B200" - "Uses nvidia/MiniMax-M2.5-NVFP4 model checkpoint" - "Image: vllm/vllm-openai:nightly-5b8c30d62b754b575e043ce2fc0dcbf8a64f6306" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/xxx

Why existing code doesn't prevent it: The changelog file is manually maintained YAML with no automated validation of pr-link values. There's no CI check that verifies placeholder values aren't committed.

Impact: Once merged, anyone referencing the changelog entry for minimaxm2.5-fp4-b200-vllm will follow a broken link. The correct URL should be https://github.com/SemiAnalysisAI/InferenceX/pull/996. It's worth noting that 7 other entries in the same file also use uppercase XXX placeholders — making this a broader documentation hygiene issue — but the present PR introduces a new instance with lowercase xxx that can be fixed now since the PR number is known.

How to fix: Replace pull/xxx with pull/996 in the new entry.

Step-by-step proof:

This PR is numbered 996 (visible in PR metadata: <pr number="996">).

The diff adds a new entry to perf-changelog.yaml with pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/xxx.

Navigating to https://github.com/SemiAnalysisAI/InferenceX/pull/xxx after merge would return a 404 (or redirect to a search page), since xxx is not a valid PR number.

The fix is trivial: change xxx → 996 in that one line.

functionstackx · 2026-04-02T20:41:30Z

benchmarks/single_node/minimaxm2.5_fp4_b200.sh

+start_gpu_monitor
+
+set -x
+vllm serve $MODEL --port $PORT \


add nvfp4 minimax to vllm recipes plz

functionstackx · 2026-04-02T21:17:10Z

.github/configs/nvidia-master.yaml

    - { tp: 4, conc-start: 4, conc-end: 64 }

+minimaxm2.5-fp4-b200-vllm:
+  image: vllm/vllm-openai:nightly-5b8c30d62b754b575e043ce2fc0dcbf8a64f6306


0.19 comes out tmw btw

…VFP4 B200

Add MiniMax-M2.5 NVFP4 vLLM benchmark config for B200

67fdc4d

github-project-automation bot added this to InferenceMAX Board Apr 2, 2026

Ankur-singh requested a review from a team April 2, 2026 20:20

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners April 2, 2026 20:20

Klaud-Cold pushed a commit that referenced this pull request Apr 2, 2026

Fix PR placeholder in perf-changelog.yaml to reference PR #996

31510da

Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>

Ankur-singh added the sweep-enabled label Apr 2, 2026

claude bot reviewed Apr 2, 2026

View reviewed changes

functionstackx reviewed Apr 2, 2026

View reviewed changes

Use vLLM nightly image and update benchmark script for MiniMax-M2.5 N…

08bcde8

…VFP4 B200

Ankur-singh force-pushed the minimaxm2.5-nvfp4-b200 branch from 31510da to 08bcde8 Compare April 2, 2026 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Minimaxm2.5 nvfp4 b200#996

[WIP] Minimaxm2.5 nvfp4 b200#996
Ankur-singh wants to merge 2 commits intomainfrom
minimaxm2.5-nvfp4-b200

Ankur-singh commented Apr 2, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Ankur-singh commented Apr 2, 2026

Uh oh!

Klaud-Cold commented Apr 2, 2026 •

edited

Loading

Uh oh!

claude bot Apr 2, 2026

Uh oh!

claude bot Apr 2, 2026

Uh oh!

functionstackx Apr 2, 2026

Uh oh!

functionstackx Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ankur-singh commented Apr 2, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Ankur-singh commented Apr 2, 2026

Uh oh!

Klaud-Cold commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Task: Update PR description and fix perf-changelog.yaml

Done

Uh oh!

claude bot Apr 2, 2026

Choose a reason for hiding this comment

Bug: --stream-interval is a SGLang-only argument passed to vllm serve

Uh oh!

claude bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ankur-singh commented Apr 2, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Apr 2, 2026 •

edited

Loading