Skip to content

Improvements to inference using int8 compressed kv's#871

Open
copybara-service[bot] wants to merge 1 commit intodevfrom
test_875150774
Open

Improvements to inference using int8 compressed kv's#871
copybara-service[bot] wants to merge 1 commit intodevfrom
test_875150774

Conversation

@copybara-service
Copy link

Improvements to inference using int8 compressed kv's
Multiplication is done using int16*int16 multiplication instructions avoid expensive conversion to f32/bf16

Multiplication is done using int16*int16 multiplication instructions avoid expensive conversion to f32/bf16

PiperOrigin-RevId: 875150774
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants