Skip to content

feat: Add SVE kernels for TopKV#1256

Open
morgolock wants to merge 1 commit intomainfrom
pr/topkv_sve_kernels
Open

feat: Add SVE kernels for TopKV#1256
morgolock wants to merge 1 commit intomainfrom
pr/topkv_sve_kernels

Conversation

@morgolock
Copy link
Contributor

Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799

filelist.json Outdated
"src/cpu/kernels/topkv/generic/neon/qasymm8_signed.cpp"
]
}
"files": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the indentation change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next patch

@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 1deb127 to 7944860 Compare February 6, 2026 16:12
@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 7944860 to 21ba8e9 Compare March 2, 2026 12:56
inline uint32_t count_gt_block<float16_t>(const float16_t *ptr, float16_t thr, uint32_t block_elems)
{
const svbool_t pg = svwhilelt_b16(static_cast<uint64_t>(0), static_cast<uint64_t>(block_elems));
const svfloat16_t v = svld1_f16(pg, ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have two questions to ask:

  • Why do we convert to Fp32 in the Neon(TM) implementation?
  • We should incorporate epsilon in both implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in next patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we convert to Fp32 in the Neon(TM) implementation

We could do the computation in f16 but the epsilon must be f32 to align with ref. I can address this in a different patch. It's not a serious problem, if anything there is room to optimize the neon kernel even further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, t's a minor accuracy discrepancy. Currently, what I'm not too in favour of is the difference between the implementations:

  • We compare in Fp16 in CPPTopKV
  • We convert everything to Fp32 and do the comparison in Fp32 in Neon implementation
  • We convert the Fp32 thr+eps to Fp16 here, and do the conversion in Fp16 in SVE implementation.

I think the ideal solution should be doing the same thing for all. The problem in ref. implementation is also something to consider.

By the way, eps in Fp16 is different and is not equal to the epsilon in Fp32 when converted to Fp16. It adds additional numerical complexity.

What do you suggest we do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd merge these SVE kernels because they bring a considerable uplift (1.6x) and there are no failures.

If fp16 is a problem I can remove the SVE kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SVE kernel is a very good add, so I wasn't even considering removing it. I was more of asking the plan for handling this numerical inconsistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two options:

a) Rework all that in this patch
b) Rework it on a next patch aligning FP16 comparison in neon/sve with ref

@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 21ba8e9 to 36f1ac1 Compare March 9, 2026 14:28
@morgolock morgolock requested a review from gunes-arm March 9, 2026 14:29
@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 36f1ac1 to c6b4071 Compare March 10, 2026 11:49
Resolves MLCE-1719

Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from c6b4071 to 98ab9ec Compare March 10, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants