Add merge() method to LoRALinear for zero-cost inference deployment#18321
Add merge() method to LoRALinear for zero-cost inference deployment#18321derekxu wants to merge 1 commit intopytorch:mainfrom
Conversation
Summary: Implements W_merged = W + (alpha/rank) * B @ A, allowing LoRA weights to be folded into the base linear layer at deployment time, eliminating additional inference latency per the LoRA paper (arxiv 2106.09685). Differential Revision: D97174451
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18321
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit 388cd45 with merge base b2f0a5a ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Summary:
Implements W_merged = W + (alpha/rank) * B @ A, allowing LoRA weights
to be folded into the base linear layer at deployment time, eliminating
additional inference latency per the LoRA paper (arxiv 2106.09685).
Differential Revision: D97174451