ENH: Use AVX512-FP16 SVML content for float16 umath functions#23351
Conversation
|
In general this seems fine to me although @seiko2plus would be nice to have your opinion. Is there any reason to worry that 3ULP in half precision is a much bigger relative error compared to float32 precision? Right now, we effectively get 0.5ULP reliably (since we use float32), bumping the ULPs on these so much seems like it is bound to be noticed. I do not disagree that that whoever does notice it probably shouldn't be using float16 to begin with (doing math in float16 seems very specialized for certain applications). |
|
2020b31 is a hack, I am open to better alternatives. SVML FP16 requires the latest assembler and we shouldn't add them to |
|
Looks like |
|
Added benchmark results above. |
So this would be unused in our wheels, which use a manylinux_214 docker container to build. If I recall correctly, that uses gcc 9.3. |
Yeah, is there a path to using gcc >= 12.x? |
Also true for the other PR #23435 |
Maybe, but it would involve:
Then we would have to add at least one CI job with an older compiler. |
|
rebased. |
|
numpy/SVML#3 takes care of raising invalid for sin/cos. This patch, when build with gcc-12, passes all the umath tests locally on my SKX:
It does, however, fail two linalg tests which I don't think are related to this patch (they fail with the main branch too) and I think those are SDE bugs as well: |
|
ping. |
intel_spr_sde_test (pull_request) passed only the build log, it seems SDE didn't enabled SPR. @r-devulap, Would it be better to use : python runtests.py -n -- -k 'test_umath|linalg|test_ufunc'instead of calling pytest module directly so we can get the runtime features log during testing.
Rather SVML, nor our infrastructure supports AVX512-FP16 on MSVC. |
|
The SDE has a bug which corrupts the x87 stack leads to lot of failures in the test suite. This is why we only have a build test. It should be fixed in the next release of SDE and we can enable the tests back again. |
|
ping :) |
|
@r-devulap needs a rebase |
|
rebased with main, the CI run on SPR should test this patch. |
|
hah, we moved to meson since and that needs updating. |
|
Thanks @r-devulap |
This is a manual revert of numpygh-23351 since things were moved around quite a lot since then.
This is a manual revert of numpygh-23351 since things were moved around quite a lot since then.
Leverage AVX512 FP16 SVML content. These are up to 4-5x faster than using FP32 SVML functions which were already (added in #21955). Max ULP errors are listed below, still working on getting exact benchmark numbers.
Requires gcc >= 12.x for build.
Benchmark results on Intel Sapphire Rapids shows upto 6x speed up for float16 functions. I am not sure why it shows regression on some of the other ufuncs which are unrelated to this patch.