Skip to content

add mark_not_offload() interface for cpu_offload_v1#2770

Open
lhb8125 wants to merge 2 commits intoNVIDIA:mainfrom
lhb8125:hongbinl/offload_activation_mxfp8_fix
Open

add mark_not_offload() interface for cpu_offload_v1#2770
lhb8125 wants to merge 2 commits intoNVIDIA:mainfrom
lhb8125:hongbinl/offload_activation_mxfp8_fix

Conversation

@lhb8125
Copy link
Contributor

@lhb8125 lhb8125 commented Mar 17, 2026

Description

This PR adds an the interface of mark_not_offload() for cpu_offload v1 version, so MCore's offloading feature could reuse the mark_not_offload() interface to avoid offloading some tensors.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Hongbin Liu <hongbinl@nvidia.com>
@ksivaman ksivaman requested a review from pggPL March 17, 2026 05:59
@ksivaman
Copy link
Member

@lhb8125 Could you please update the PR description? @pggPL for review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR wires up mark_not_offload() for the V1 CPU offload code path (NVTE_CPU_OFFLOAD_V1=1). Previously, the function returned early without doing anything when the V1 path was active. The change correctly adds a delegation call in cpu_offload.py and introduces a new mark_not_offload implementation in cpu_offload_v1.py.

However, the V1 implementation contains a logic bug that makes it a no-op:

  • cpu_offload.py — the delegation fix is correct and straightforward.
  • cpu_offload_v1.py — the new mark_not_offload sets _TE_do_not_offload = True on the underlying tensors. This attribute is checked exclusively in the non-V1 path (OffloadableLayerState._check_if_offload, cpu_offload.py:463). The V1 path decides whether to offload via tensor_need_offloading_checker_activations, which only checks for hasattr(tensor, "activation_offloading"). As a result, if a tensor was previously marked with mark_activation_offload, calling mark_not_offload will not prevent it from being offloaded in V1. The V1 implementation should instead remove the activation_offloading attribute (matching the opt-in semantics of the V1 offloading mechanism), similar to how mark_activation_offload sets it.

Confidence Score: 2/5

  • Not safe to merge — the core V1 implementation sets an attribute that is never checked, making mark_not_offload a silent no-op in the V1 path.
  • The delegation in cpu_offload.py is correct, but the actual V1 implementation in cpu_offload_v1.py uses the wrong attribute (_TE_do_not_offload instead of removing activation_offloading). The V1 offloading decision is entirely gated on activation_offloading, so the new function does not achieve its stated goal.
  • transformer_engine/pytorch/cpu_offload_v1.py — specifically the new mark_not_offload function (lines 52–60).

Important Files Changed

Filename Overview
transformer_engine/pytorch/cpu_offload_v1.py Adds mark_not_offload to the V1 code path and imports prepare_for_saving/restore_from_saved, but the implementation sets _TE_do_not_offload which is never consulted by V1's tensor_need_offloading_checker — the function is effectively a no-op.
transformer_engine/pytorch/cpu_offload.py Small, correct change: delegates to v1_code_path.mark_not_offload before returning early when NVTE_CPU_OFFLOAD_V1 is set, fixing the previously silent no-op.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["mark_not_offload(*tensors) called\n(cpu_offload.py)"] --> B{NVTE_CPU_OFFLOAD_V1?}
    B -- Yes --> C["v1_code_path.mark_not_offload(*tensors)\n(cpu_offload_v1.py)"]
    B -- No --> D["prepare_for_saving(*tensors)"]
    D --> E["setattr(tensor, '_TE_do_not_offload', True)"]
    E --> F["restore_from_saved(tensor_obj, tensors)"]
    F --> G["Non-V1 push_tensor checks\n_TE_do_not_offload ✅"]

    C --> H["prepare_for_saving(*tensors)"]
    H --> I["setattr(tensor, '_TE_do_not_offload', True)"]
    I --> J["restore_from_saved(tensor_obj, tensors)"]
    J --> K["V1 tensor_need_offloading_checker\nchecks activation_offloading only ❌\n_TE_do_not_offload is never read"]

    style K fill:#ffcccc,stroke:#cc0000
    style G fill:#ccffcc,stroke:#009900
Loading

Last reviewed commit: 730d7a1

Comment on lines +52 to +60
def mark_not_offload(*tensors: torch.Tensor):
"""Marks tensors to prevent them from being offloaded."""
tensors, tensor_obj = prepare_for_saving(*tensors)

for tensor in tensors:
if tensor is not None:
setattr(tensor, "_TE_do_not_offload", True)

restore_from_saved(tensor_obj, tensors)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 _TE_do_not_offload is never checked in V1 path — function is a no-op

The V1 code path decides whether to offload a tensor using tensor_need_offloading_checker_activations, which checks only for hasattr(tensor, "activation_offloading") (see cpu_offload_v1.py line 738–739). The attribute _TE_do_not_offload set here is never consulted in the V1 path, so calling this function after mark_activation_offload will not prevent offloading.

To confirm: the only place _TE_do_not_offload is read in the whole codebase is cpu_offload.py line 463 (OffloadableLayerState._check_if_offload), which belongs to the non-V1 code path.

For mark_not_offload to actually work in V1, the implementation should mirror how mark_activation_offload operates — i.e., remove (or otherwise gate on) the activation_offloading attribute:

def mark_not_offload(*tensors: torch.Tensor):
    """Marks tensors to prevent them from being offloaded."""
    for tensor in tensors:
        if tensor is None:
            continue
        if type(tensor) in [torch.Tensor, torch.nn.Parameter]:
            if hasattr(tensor, "activation_offloading"):
                del tensor.activation_offloading
        else:
            data_tensors = tensor.get_data_tensors()
            for t in data_tensors:
                if t is not None and hasattr(t, "activation_offloading"):
                    del t.activation_offloading

Alternatively, tensor_need_offloading_checker_activations could be updated to also gate on _TE_do_not_offload, but that would be a more invasive change.

@lhb8125
Copy link
Contributor Author

lhb8125 commented Mar 17, 2026

/te-ci pytorch L1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants