fix: scope get_full_cu_seqlens cache key by device and inference mode#2728
fix: scope get_full_cu_seqlens cache key by device and inference mode#2728DmCarpe93 wants to merge 5 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR fixes a bug in Key changes:
Confidence Score: 5/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["get_full_cu_seqlens called"] --> B{ONNX export mode?}
B -- Yes --> C["Skip cache, create tensor directly"]
B -- No --> D["Read torch.is_inference_mode_enabled()"]
D --> E["Form tuple: batch_size + max_seqlen + device + inference_flag"]
E --> F{Tuple in cache dict?}
F -- No --> G["Allocate cu_seqlens via torch.arange"]
G --> H["Write to global cache"]
H --> I["Return tensor"]
F -- Yes --> I["Return cached tensor"]
|
|
@cyanguwa When you have a moment, could you please take a look at this PR? Thanks:) |
|
@cyanguwa This PR is pretty straightforward. Would you mind taking a quick look? Thank you:) |
Description
Fixed an issue where the cu_seqlen tensor was incorrectly retrieved from the cache.
(batch_size, max_seqlen)were used as the cache key when retrieving cu_seqlens.(batch_size, max_seqlen)is used.Type of change
Changes
Checklist: