Skip to content

HuBERT support rollout#1111

Open
david-wei-01001 wants to merge 121 commits intoTransformerLensOrg:devfrom
david-wei-01001:hubert
Open

HuBERT support rollout#1111
david-wei-01001 wants to merge 121 commits intoTransformerLensOrg:devfrom
david-wei-01001:hubert

Conversation

@david-wei-01001
Copy link

@david-wei-01001 david-wei-01001 commented Nov 7, 2025

Description

This add support for the HuBERT model, extending Transformer Lens to speech/audio models. Codes has been thoroughly tested, test codes can be found in demos/HuBERT_test folder. No additional dependencies needed

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@david-wei-01001
Copy link
Author

All requested changes completed

@jlarson4
Copy link
Collaborator

Thank you @david-wei-01001! Just need to get the CI passing now.

It looks like you need to run the formatter, make sure your models have been documented in the Colab Compatibility notebook, and make sure your tests pass.

@david-wei-01001
Copy link
Author

All done!
The claimed failure in Compatibility Checks (3.9) are from my tests, and I have run them in Google Colab, my code passes those tests. The errors are due to non-identical import and installation logs, device mismatch (CPU vs CUDA), and Calculation of cosine similarity where mismatches several digits pass the decimal is perfectly acceptable.

And I have tried my best to reorder the imports in my HookedAudioEncoder code, but since the test did not output what is my problem (they just say I did not order correctly), I really don't know what's a better ordering.

Thank you very much!

@jlarson4
Copy link
Collaborator

Thank you for getting to it so quickly! I am aiming to get the next release out very soon, so I appreciate the swift response, I know you've been waiting on this PR since November.

I apologize, I misunderstood the intention of your original HuBERT demo files. I thought they were meant to be demos for how your HuBERT implementation works. They are actually tests for HuBERT and you moved them to the correct location in tests/unit/, but also followed my note about making them notebooks. The notebook change is now causing the Compatibility Checks errors. In the context of tests/unit/, those files should stay .py files as you originally had them, which should help with them passing the CI. They'll just need assertions to confirm proper values are created, rather than doing prints.

For the sorting, the dependencies include isort, which can be run with poetry run isort . to auto sort any imports according to project specifications. I ran this already and pushed it to your branch so that CI could rerun and hopefully give you some better output. Make sure you pull that down before making the changes to the testing above.

@david-wei-01001
Copy link
Author

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants