Conversation
filelist.json
Outdated
| "src/cpu/kernels/topkv/generic/neon/qasymm8_signed.cpp" | ||
| ] | ||
| } | ||
| "files": { |
There was a problem hiding this comment.
why the indentation change?
There was a problem hiding this comment.
fixed in next patch
1deb127 to
7944860
Compare
7944860 to
21ba8e9
Compare
| inline uint32_t count_gt_block<float16_t>(const float16_t *ptr, float16_t thr, uint32_t block_elems) | ||
| { | ||
| const svbool_t pg = svwhilelt_b16(static_cast<uint64_t>(0), static_cast<uint64_t>(block_elems)); | ||
| const svfloat16_t v = svld1_f16(pg, ptr); |
There was a problem hiding this comment.
I think I have two questions to ask:
- Why do we convert to Fp32 in the Neon(TM) implementation?
- We should incorporate epsilon in both implementations.
There was a problem hiding this comment.
Addressed in next patch.
There was a problem hiding this comment.
Why do we convert to Fp32 in the Neon(TM) implementation
We could do the computation in f16 but the epsilon must be f32 to align with ref. I can address this in a different patch. It's not a serious problem, if anything there is room to optimize the neon kernel even further.
There was a problem hiding this comment.
Yes, t's a minor accuracy discrepancy. Currently, what I'm not too in favour of is the difference between the implementations:
- We compare in Fp16 in CPPTopKV
- We convert everything to Fp32 and do the comparison in Fp32 in Neon implementation
- We convert the Fp32 thr+eps to Fp16 here, and do the conversion in Fp16 in SVE implementation.
I think the ideal solution should be doing the same thing for all. The problem in ref. implementation is also something to consider.
By the way, eps in Fp16 is different and is not equal to the epsilon in Fp32 when converted to Fp16. It adds additional numerical complexity.
What do you suggest we do?
There was a problem hiding this comment.
I'd merge these SVE kernels because they bring a considerable uplift (1.6x) and there are no failures.
If fp16 is a problem I can remove the SVE kernel.
There was a problem hiding this comment.
I think SVE kernel is a very good add, so I wasn't even considering removing it. I was more of asking the plan for handling this numerical inconsistency.
There was a problem hiding this comment.
Two options:
a) Rework all that in this patch
b) Rework it on a next patch aligning FP16 comparison in neon/sve with ref
21ba8e9 to
36f1ac1
Compare
36f1ac1 to
c6b4071
Compare
Resolves MLCE-1719 Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799 Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
c6b4071 to
98ab9ec
Compare
Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799