Skip to content

[Misc] QD_DUMP_IR dumps to files with the task_id added to the filename#441

Draft
hughperkins wants to merge 7 commits intomainfrom
hp/dump-ir-multiple-tasks
Draft

[Misc] QD_DUMP_IR dumps to files with the task_id added to the filename#441
hughperkins wants to merge 7 commits intomainfrom
hp/dump-ir-multiple-tasks

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

… tasks

When a kernel contains multiple top-level for loops, each loop is compiled as a separate offloaded task. Previously all tasks wrote their LLVM IR to the same file (kernel->name + "_llvm.ll"), so only the last task to finish was visible. Use the per-task name instead so each gets its own file.

Made-with: Cursor

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

… tasks

When a kernel contains multiple top-level for loops, each loop is compiled
as a separate offloaded task. Previously all tasks wrote their LLVM IR to
the same file (kernel->name + "_llvm.ll"), so only the last task to finish
was visible. Use the per-task name instead so each gets its own file.

Made-with: Cursor
…tiple tasks

Same issue as the _llvm.ll fix: moduleToDumpName returns only the first
function name, so multi-task kernels produce only one dump file. Use new
moduleDumpNames helper to iterate over all defined functions and dump
one file per task.
- Apply clang-format fix in codegen_llvm.cpp
- Avoid string-concatenation temporaries in jit_cuda.cpp PTX dump path
  to satisfy clang-tidy performance-inefficient-string-concatenation

Made-with: Cursor
…asks

Made-with: Cursor

# Conflicts:
#	quadrants/runtime/cuda/jit_cuda.cpp
@hughperkins
Copy link
Copy Markdown
Collaborator Author

@claude review

Comment thread quadrants/runtime/cuda/jit_cuda.cpp Outdated
Comment thread quadrants/codegen/llvm/codegen_llvm.cpp Outdated
Comment thread quadrants/runtime/cuda/jit_cuda.cpp Outdated
…tional log

- Capture dump name before compile_module_to_ptx so LOAD_PTX uses
  pre-rename names matching the dump path (fixes name divergence when
  convert() mangles symbols)
- Dump module/PTX once per add_module call instead of N identical copies
  per task, since the whole module is written each time anyway
- Move "PTX dumped to:" message inside the file-open check so it only
  prints on success
- Remove unused moduleDumpNames() helper
@hughperkins
Copy link
Copy Markdown
Collaborator Author

@claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review feedback has been addressed — rename-timing mismatch, duplicate per-task dumps, and unconditional success log are all fixed; the remaining nit about DUMP_IR/LOAD_IR empty-tasks asymmetry is minor and debug-only.

Extended reasoning...

Overview

This PR modifies debug-only IR dumping paths in two files: codegen_llvm.cpp (LLVM IR dump/load on the CPU/generic codegen side) and jit_cuda.cpp (PTX and pre-PTX LLVM IR dump/load on the CUDA side). The changes are gated on QUADRANTS_DUMP_IR and QUADRANTS_LOAD_PTX environment variables and have no impact on compiled kernel correctness.

Security risks

None. The changes only affect developer-facing debug output paths gated on environment variables. No auth, crypto, permissions, or user-input parsing is touched. Filenames are derived from internal kernel/task/function names, not from user-controlled input.

Level of scrutiny

Low — this is a debug-only feature. The original overwrite problem (all tasks writing to the same file and only the last one being visible) is resolved by using offloaded_tasks[0].name on the LLVM side and the pre-compile function name on the CUDA side. The LOAD side now uses the same name captured pre-compile, fixing the asymmetry I flagged previously.

Other factors

The remaining inline nit (DUMP_IR has a kernel->name fallback for empty offloaded_tasks while LOAD_IR hard-asserts) is cosmetic — in practice offloaded_tasks is populated before this block runs, and even if the divergence were hit, the impact is a debug crash instead of a silent load failure. Not worth blocking the PR; the author can decide whether to tighten the asymmetry in a follow-up. The previous high/medium-severity bugs have been resolved, so I'm comfortable approving.

Comment thread quadrants/codegen/llvm/codegen_llvm.cpp
Both paths now consistently assert rather than having a defensive
fallback in DUMP_IR that LOAD_IR can never consume.
@hughperkins
Copy link
Copy Markdown
Collaborator Author

@claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all prior review feedback has been addressed: dump_name is captured before convert() runs, per-task filename duplication is gone, the PTX success message is gated on a successful open, and DUMP_IR/LOAD_IR now assert symmetrically.

Extended reasoning...

Overview

This is a small debug-only change to two files (codegen_llvm.cpp, jit_cuda.cpp) that renames the LLVM IR / PTX dump filename from a kernel-level name to a per-task name so multi-task kernels no longer clobber each other's dump files. The LOAD_IR / LOAD_PTX paths are updated in lockstep so that what is written is what is read back.

Security risks

None. The affected code is gated on QD_DUMP_IR, QD_LOAD_IR, and QUADRANTS_LOAD_PTX env vars intended for developer debugging. No auth, crypto, permission, or externally-reachable input path is touched. The filenames are derived from internal LLVM module/function names and written to a configured debug_dump_path.

Level of scrutiny

Low. Debug-only path, small diff (~20 changed lines), straightforward intent, no effect on compiled kernel correctness. The only user-visible failure mode is a developer getting fewer (or no) dump files in a misconfigured run, which falls back cleanly.

Other factors

Every prior bug I flagged on earlier revisions is now addressed in the current diff: the jit_cuda.cpp rename-timing asymmetry is fixed by capturing dump_name pre-compile_module_to_ptx, the per-task loop that previously produced N bit-identical files is gone in favor of a single offloaded_tasks[0].name file, the misleading unconditional "PTX dumped to:" std::cout is now inside the successful-open branch, and DUMP_IR/LOAD_IR both assert !offloaded_tasks.empty() so they cannot disagree on filename handling for a degenerate empty-tasks case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant