Skip to content

Use fp32 FMA in channelwise conv#4809

Draft
klin2024 wants to merge 1 commit intodevelopfrom
use_fp32_for_fma
Draft

Use fp32 FMA in channelwise conv#4809
klin2024 wants to merge 1 commit intodevelopfrom
use_fp32_for_fma

Conversation

@klin2024
Copy link
Copy Markdown
Contributor

Motivation

Unsure whether we want to adopt a more accurate approach here; this is just a thought.

Technical Details

Store weights in fp32 registers and accumulate in fp32 to improve accuracy for fp16 input. Cast back to output type at the end.

Changelog Category

Add a CHANGELOG.md entry for any option other than Not Applicable

    • Added: New functionality.
    • Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

Store weights in fp32 registers and accumulate in fp32 to improve
accuracy for fp16 input. Cast back to output type at the end.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4809   +/-   ##
========================================
  Coverage    92.49%   92.49%           
========================================
  Files          583      583           
  Lines        29562    29562           
========================================
  Hits         27343    27343           
  Misses        2219     2219           

see 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants