Add SVML AVX512 FP16 content by r-devulap · Pull Request #2 · numpy/SVML

r-devulap · 2023-03-08T20:06:18Z

Open sourcing new content for FP16 umath functions based on AVX-512 FP16 ISA on Intel Sapphire Rapids. These were measured to be 140x faster than the scalar counterpart in NumPy (which required conversion to FP32 and back). These have max ULP error of 3.05 (detailed ULP error listed here: numpy/numpy#23351 (comment)).

FP16 Ufunc	scalar FP16 (ms)	AVX512 FP16 SVML (μs)	Speed up on AVX-512 FP16 SPR
arctanh	8.58	31.5	272.4
tanh	7.38	27.1	272.3
cbrt	6.76	30.5	221.6
arcsinh	8.05	43.7	184.2
arctan	5.18	29.2	177.4
arccosh	7.41	48.2	153.7
arccos	4.94	33.3	148.3
cosh	5.02	34.2	146.8
arcsin	4.39	31.6	138.9
sinh	7.22	54.8	131.8
log10	4.11	31.7	129.7
sin	3.43	29.1	117.9
log1p	5.1	43.3	117.8
cos	3.38	32.2	105.0
log	2.63	29.8	88.3
tan	5.78	68.9	83.9
log2	2.31	30.3	76.2
expm1	4.89	65.0	75.2
exp	2.14	28.9	74.0
exp2	2.08	28.8	72.2

r-devulap · 2023-03-08T20:09:33Z

@seberg Any objection to adding these new content?

seberg

Looks good to add (please go ahead if it helps with the other PR), seems "small" compared to the rest anyway. The NumPy PR should be the one where to discuss.

I am slightly hesitant about the order of things, but probably it doesn't matter. I.e. it would be nice to have have basic support for faster half before adding AVX512.

seberg · 2023-03-09T08:34:57Z

Ah, nvm. I didn't realize we already did other speedups here anyway.

r-devulap · 2023-03-09T17:08:01Z

Sounds good, thanks.

Raghuveer Devulapalli added 4 commits August 2, 2022 12:36

Adding FP16 source files

3e6058b

ENH: Change function signature to process arrays

b4607aa

Use 0 as src for masked load

7f2807c

MAINT: Mark data section non-executable in SVML FP16 content

e2c1105

r-devulap requested a review from seberg March 8, 2023 20:08

seberg approved these changes Mar 9, 2023

View reviewed changes

r-devulap merged commit 86f9647 into numpy:main Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SVML AVX512 FP16 content#2

Add SVML AVX512 FP16 content#2
r-devulap merged 4 commits into
numpy:mainfrom
r-devulap:fp16

r-devulap commented Mar 8, 2023 •

edited

Loading

Uh oh!

r-devulap commented Mar 8, 2023

Uh oh!

seberg left a comment

Uh oh!

seberg commented Mar 9, 2023

Uh oh!

r-devulap commented Mar 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

r-devulap commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-devulap commented Mar 8, 2023

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

seberg commented Mar 9, 2023

Uh oh!

r-devulap commented Mar 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

r-devulap commented Mar 8, 2023 •

edited

Loading