[clamp] Fix float16 scalar overflow check inconsistency between CPU and GPU by brijrajk · Pull Request #185756 · pytorch/pytorch

brijrajk · 2026-05-31T12:04:31Z

Summary

torch.clamp / torch.clip on float16 tensors had inconsistent validation
between CPU and GPU. When a scalar bound exceeds the float16 range (~65504),
CPU correctly raises RuntimeError but GPU silently succeeded and returned
incorrect results — the out-of-range bound saturates to inf when stored back
as float16, giving wrong clamp behavior with no warning.

x = torch.zeros(1, dtype=torch.float16)
torch.clip(x, max=65507.0)          # CPU: RuntimeError ✓
torch.clip(x.cuda(), max=65507.0)   # GPU: silent wrong result ✗

Root Cause

CPU kernels (cpu/TensorCompareKernel.cpp) convert the scalar bound via
.to<scalar_t>() where scalar_t = at::Half, which calls c10::check_overflow
and raises. CUDA kernels (cuda/TensorCompare.cu) first promote the scalar to
opmath_t (float for float16) before using it, so a value like 65507.0
fits in float without triggering any overflow check — the error is silently
bypassed.

Fix

Add the overflow check in TORCH_META_FUNC(clamp) in TensorCompare.cpp.
The meta function runs before kernel dispatch on all devices, making the
check device-agnostic in a single change. The existing c10::Scalar::to<T>()
mechanism is reused so the error message is identical to what CPU already
produced — no new error strings introduced.

isReducedFloatingType + AT_DISPATCH_REDUCED_FLOATING_TYPES ensures the
check covers float16, bfloat16, and all float8 variants. bfloat16 with
max=65507 correctly does not raise since 65507 is representable in
bfloat16 (same exponent range as float32).

Prior Attempts and Related Issues

[CUDA] Add float16 overflow check in clamp_scalar to match CPU behavior #173776 (closed without merge): Fixed only the CUDA kernel path using a
custom c10::overflows check. Left XPU, ROCm, and any future backends
unpatched, and produced mangled C++ type names in the error message.
[ai_generated] torch.clip with float16 overflow scalar does not raise RuntimeError on XPU intel/torch-xpu-ops#3425: Intel XPU team filed the same bug for their
backend, confirming that a CUDA-only fix is insufficient. Our meta-function
approach fixes XPU automatically.

Fixes #171356

Checklist

Passes lint (lintrunner aten/src/ATen/native/TensorCompare.cpp test/test_shape_ops.py)
Added/updated tests
Updated documentation (if applicable)

BC-breaking?

No — this turns a silent wrong result into a RuntimeError consistent with
what CPU already raised. No previously correct code is broken.

Developed with AI assistance (Claude). Reviewed and verified by the author.

pytorch-bot · 2026-05-31T12:04:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/185756

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[ROCm] MI350 CI jobs will have longer queue times due to CI migration

This comment was automatically generated by Dr. CI and updates every 15 minutes.

brijrajk · 2026-05-31T12:05:36Z

@pytorchbot label "ciflow/trunk"

brijrajk · 2026-05-31T12:05:37Z

@pytorchbot label "topic: not user facing"

pytorch-bot · 2026-05-31T12:05:52Z

The ciflow label(s) ciflow/trunk will be added, but CI won't be triggered until the workflows are approved (scroll to the bottom of this page).

Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot · 2026-05-31T12:05:57Z

The following ciflow label(s) have been added but CI has not been triggered yet because the workflows are awaiting approval:

ciflow/trunk

Once a maintainer approves the workflows (scroll to the bottom of the PR page), the corresponding CI jobs will be triggered automatically. Please ping one of the reviewers if you do not have access to approve and run workflows.

…in meta function torch.clamp/torch.clip on float16 tensors was inconsistent between CPU and GPU. CPU kernels convert the scalar bound via .to<scalar_t>() which calls c10::check_overflow and raises RuntimeError for out-of-range values. CUDA kernels first promote the scalar to opmath_t (float for float16), so a value like 65507 fits in float without overflow, silently bypassing the check and producing incorrect results (the bound saturates to inf when stored back as float16). The fix adds an explicit overflow check in TORCH_META_FUNC(clamp) for isReducedFloatingType dtypes (float16, bfloat16, float8 variants). The meta function runs before kernel dispatch on all devices, so the check fires consistently whether the tensor is on CPU, CUDA, or ROCm. The existing c10::Scalar::to<T>() mechanism is reused so the error message matches the one already produced by CPU kernels. Fixes pytorch#171356 Test Plan: ``` cd /tmp source /path/to/.venv-src/bin/activate python3 test/test_shape_ops.py \ TestShapeOpsCPU.test_clamp_float16_scalar_overflow_cpu \ TestShapeOpsCUDA.test_clamp_float16_scalar_overflow_cuda \ -v ``` All tests pass. Verified on AMD Radeon AI PRO R9700 (gfx1201, ROCm 7.0). bfloat16 with max=65507 correctly does not raise (65507 is representable in bf16). Authored by Claude.

brijrajk · 2026-06-22T18:04:36Z

Rebased on latest main. @jbschlosser — you've recently touched TensorCompare.cpp so tagging you as the likely reviewer. This fixes a silent inconsistency where torch.clamp with an overflowing float16 scalar raises on CPU but silently wraps on CUDA/XPU. The fix lives in the meta function so it's backend-agnostic. Would you be able to take a look and approve CI workflows?

pytorch-bot Bot added the topic: not user facing topic category label May 31, 2026

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 31, 2026

pytorchbot added the open source label May 31, 2026

jbschlosser requested review from malfet and zeshengzong and removed request for zeshengzong June 1, 2026 21:08

jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 1, 2026

brijrajk force-pushed the fix/clip-float16-overflow-171356 branch from 8d40819 to 15d8663 Compare June 22, 2026 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clamp] Fix float16 scalar overflow check inconsistency between CPU and GPU#185756

[clamp] Fix float16 scalar overflow check inconsistency between CPU and GPU#185756
brijrajk wants to merge 1 commit into
pytorch:mainfrom
brijrajk:fix/clip-float16-overflow-171356

brijrajk commented May 31, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented May 31, 2026 •

edited

Loading

Uh oh!

brijrajk commented May 31, 2026

Uh oh!

brijrajk commented May 31, 2026

Uh oh!

pytorch-bot Bot commented May 31, 2026

Uh oh!

pytorch-bot Bot commented May 31, 2026

Uh oh!

brijrajk commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

brijrajk commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Prior Attempts and Related Issues

Checklist

BC-breaking?

Uh oh!

pytorch-bot Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/185756

❗ 1 Active SEVs

Uh oh!

brijrajk commented May 31, 2026

Uh oh!

brijrajk commented May 31, 2026

Uh oh!

pytorch-bot Bot commented May 31, 2026

Uh oh!

pytorch-bot Bot commented May 31, 2026

Uh oh!

brijrajk commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brijrajk commented May 31, 2026 •

edited

Loading

pytorch-bot Bot commented May 31, 2026 •

edited

Loading