Skip to content

Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18314)#18314

Open
3l1 wants to merge 1 commit intomainfrom
export-D97266678
Open

Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18314)#18314
3l1 wants to merge 1 commit intomainfrom
export-D97266678

Conversation

@3l1
Copy link
Contributor

@3l1 3l1 commented Mar 19, 2026

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

  1. NHWC-safe reshape detection: When a 4D→4D view_copy has monotonic
    shape_indices on the raw shapes and preserves both the batch dim (index 0)
    and the last dimension (NHWC channel) alone in their output groups, skip
    inserting input/output transposes. The view_copy can operate directly on
    NHWC data.

  2. Redundant permute_copy elimination: Model-level permute_copy ops whose
    permutation matches channels_last_order (NCHW→NHWC) or its inverse
    (NHWC→NCHW) AND whose input already has NHWC tosa_dim_order are redundant
    with the tosa_dim_order annotation. Replace them with view_copy (identity
    reshape) to avoid generating TOSA TRANSPOSE nodes. Standalone permute
    models (NCHW input from placeholder) are not affected.

Differential Revision: D97266678

@3l1 3l1 requested a review from digantdesai as a code owner March 19, 2026 08:46
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 19, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18314

Note: Links to docs will display an error until the docs builds have been completed.

❌ 91 Cancelled Jobs, 2 Unrelated Failures

As of commit 4f89292 with merge base baa9888 (image):

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 19, 2026

@3l1 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97266678.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync meta-codesync bot changed the title Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18167) Eliminate redundant NCHW↔NHWC permute_copy and NHWC-safe view_copy transposes in ToTosaMemoryFormatPass (#18314) Mar 19, 2026
meta-codesync bot pushed a commit that referenced this pull request Mar 19, 2026
…ansposes in ToTosaMemoryFormatPass (#18314)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves both the batch dim (index 0)
   and the last dimension (NHWC channel) alone in their output groups, skip
   inserting input/output transposes. The view_copy can operate directly on
   NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) AND whose input already has NHWC tosa_dim_order are redundant
   with the tosa_dim_order annotation. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Standalone permute
   models (NCHW input from placeholder) are not affected.

Differential Revision: D97266678
@AdrianLundell
Copy link
Collaborator

Hi, thanks for the PR! This is a complex topic to get right in all cases and FYI we are also planning on improving this internally so it is very nice to get some help with that. I see there are some errors in our unittests so looks like there are a few edge-cases to iron out before a proper review. Let us know if you have any questions about the current logic to help with this.

In the meanwhile that I have two comments:

  1. The first optimization seems like a slighly more narrow condition of the "is_channel_reshape" function, do you think it could fit into there rather than a separate function?
  2. For the second optimization, have you observed performance regressions due to noop transposes? Our working assumption has been that this is optimized away by the compiler.

@3l1
Copy link
Contributor Author

3l1 commented Mar 19, 2026

hey @AdrianLundell thanks for the feedback

  1. yes that is something I can look into wrapping up into the existing function

  2. yes we see on u55 (not u85) issues where regor was not optimizing away all of them, causing perf issues on firmware

(also fixing failing unit tests)

meta-codesync bot pushed a commit that referenced this pull request Mar 19, 2026
…ansposes in ToTosaMemoryFormatPass (#18314)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves both the batch dim (index 0)
   and the last dimension (NHWC channel) alone in their output groups, skip
   inserting input/output transposes. The view_copy can operate directly on
   NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) AND whose input already has NHWC tosa_dim_order are redundant
   with the tosa_dim_order annotation. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Standalone permute
   models (NCHW input from placeholder) are not affected.

Differential Revision: D97266678
meta-codesync bot pushed a commit that referenced this pull request Mar 19, 2026
…ansposes in ToTosaMemoryFormatPass (#18314)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves both the batch dim (index 0)
   and the last dimension (NHWC channel) alone in their output groups, skip
   inserting input/output transposes. The view_copy can operate directly on
   NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) AND whose input already has NHWC tosa_dim_order are redundant
   with the tosa_dim_order annotation. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Standalone permute
   models (NCHW input from placeholder) are not affected.

Differential Revision: D97266678
3l1 added a commit that referenced this pull request Mar 19, 2026
…ansposes in ToTosaMemoryFormatPass (#18314)

Summary:
Pull Request resolved: #18314

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves both the batch dim (index 0)
   and the last dimension (NHWC channel) alone in their output groups, skip
   inserting input/output transposes. The view_copy can operate directly on
   NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) AND whose input already has NHWC tosa_dim_order are redundant
   with the tosa_dim_order annotation. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Standalone permute
   models (NCHW input from placeholder) are not affected.

Differential Revision: D97266678
@3l1 3l1 force-pushed the export-D97266678 branch from 92fa406 to 3be2196 Compare March 19, 2026 19:58
meta-codesync bot pushed a commit that referenced this pull request Mar 19, 2026
…ansposes in ToTosaMemoryFormatPass (#18314)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves both the batch dim (index 0)
   and the last dimension (NHWC channel) alone in their output groups, skip
   inserting input/output transposes. The view_copy can operate directly on
   NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) AND whose input already has NHWC tosa_dim_order are redundant
   with the tosa_dim_order annotation. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Standalone permute
   models (NCHW input from placeholder) are not affected.

Differential Revision: D97266678
3l1 added a commit that referenced this pull request Mar 19, 2026
…ansposes in ToTosaMemoryFormatPass (#18314)

Summary:
Pull Request resolved: #18314

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves both the batch dim (index 0)
   and the last dimension (NHWC channel) alone in their output groups, skip
   inserting input/output transposes. The view_copy can operate directly on
   NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) AND whose input already has NHWC tosa_dim_order are redundant
   with the tosa_dim_order annotation. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Standalone permute
   models (NCHW input from placeholder) are not affected.

Differential Revision: D97266678
@3l1 3l1 force-pushed the export-D97266678 branch from 8b742ba to 4f89292 Compare March 19, 2026 20:12
…ansposes in ToTosaMemoryFormatPass (#18314)

Summary:

Two optimizations in ToTosaMemoryFormatPass to reduce TOSA TRANSPOSE nodes:

1. **NHWC-safe reshape detection:** When a 4D→4D view_copy has monotonic
   shape_indices on the raw shapes and preserves both the batch dim (index 0)
   and the last dimension (NHWC channel) alone in their output groups, skip
   inserting input/output transposes. The view_copy can operate directly on
   NHWC data.

2. **Redundant permute_copy elimination:** Model-level permute_copy ops whose
   permutation matches channels_last_order (NCHW→NHWC) or its inverse
   (NHWC→NCHW) AND whose input already has NHWC tosa_dim_order are redundant
   with the tosa_dim_order annotation. Replace them with view_copy (identity
   reshape) to avoid generating TOSA TRANSPOSE nodes. Standalone permute
   models (NCHW input from placeholder) are not affected.

Differential Revision: D97266678
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants