Skip to content

[clamp] Fix float16 scalar overflow check inconsistency between CPU and GPU#185756

Open
brijrajk wants to merge 1 commit into
pytorch:mainfrom
brijrajk:fix/clip-float16-overflow-171356
Open

[clamp] Fix float16 scalar overflow check inconsistency between CPU and GPU#185756
brijrajk wants to merge 1 commit into
pytorch:mainfrom
brijrajk:fix/clip-float16-overflow-171356

Conversation

@brijrajk

@brijrajk brijrajk commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

torch.clamp / torch.clip on float16 tensors had inconsistent validation
between CPU and GPU. When a scalar bound exceeds the float16 range (~65504),
CPU correctly raises RuntimeError but GPU silently succeeded and returned
incorrect results — the out-of-range bound saturates to inf when stored back
as float16, giving wrong clamp behavior with no warning.

x = torch.zeros(1, dtype=torch.float16)
torch.clip(x, max=65507.0)          # CPU: RuntimeError ✓
torch.clip(x.cuda(), max=65507.0)   # GPU: silent wrong result ✗

Root Cause

CPU kernels (cpu/TensorCompareKernel.cpp) convert the scalar bound via
.to<scalar_t>() where scalar_t = at::Half, which calls c10::check_overflow
and raises. CUDA kernels (cuda/TensorCompare.cu) first promote the scalar to
opmath_t (float for float16) before using it, so a value like 65507.0
fits in float without triggering any overflow check — the error is silently
bypassed.

Fix

Add the overflow check in TORCH_META_FUNC(clamp) in TensorCompare.cpp.
The meta function runs before kernel dispatch on all devices, making the
check device-agnostic in a single change. The existing c10::Scalar::to<T>()
mechanism is reused so the error message is identical to what CPU already
produced — no new error strings introduced.

isReducedFloatingType + AT_DISPATCH_REDUCED_FLOATING_TYPES ensures the
check covers float16, bfloat16, and all float8 variants. bfloat16 with
max=65507 correctly does not raise since 65507 is representable in
bfloat16 (same exponent range as float32).

Prior Attempts and Related Issues

Fixes #171356

Checklist

  • Passes lint (lintrunner aten/src/ATen/native/TensorCompare.cpp test/test_shape_ops.py)
  • Added/updated tests
  • Updated documentation (if applicable)

BC-breaking?

No — this turns a silent wrong result into a RuntimeError consistent with
what CPU already raised. No previously correct code is broken.

Developed with AI assistance (Claude). Reviewed and verified by the author.

@pytorch-bot

pytorch-bot Bot commented May 31, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/185756

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@brijrajk

Copy link
Copy Markdown
Contributor Author

@pytorchbot label "ciflow/trunk"

@brijrajk

Copy link
Copy Markdown
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot Bot added the topic: not user facing topic category label May 31, 2026
@pytorch-bot

pytorch-bot Bot commented May 31, 2026

Copy link
Copy Markdown

The ciflow label(s) ciflow/trunk will be added, but CI won't be triggered until the workflows are approved (scroll to the bottom of this page).

Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 31, 2026
@pytorch-bot

pytorch-bot Bot commented May 31, 2026

Copy link
Copy Markdown

The following ciflow label(s) have been added but CI has not been triggered yet because the workflows are awaiting approval:

  • ciflow/trunk

Once a maintainer approves the workflows (scroll to the bottom of the PR page), the corresponding CI jobs will be triggered automatically. Please ping one of the reviewers if you do not have access to approve and run workflows.

@jbschlosser jbschlosser requested review from malfet and zeshengzong and removed request for zeshengzong June 1, 2026 21:08
@jbschlosser jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 1, 2026
…in meta function

torch.clamp/torch.clip on float16 tensors was inconsistent between CPU and GPU.
CPU kernels convert the scalar bound via .to<scalar_t>() which calls
c10::check_overflow and raises RuntimeError for out-of-range values. CUDA
kernels first promote the scalar to opmath_t (float for float16), so a value
like 65507 fits in float without overflow, silently bypassing the check and
producing incorrect results (the bound saturates to inf when stored back as
float16).

The fix adds an explicit overflow check in TORCH_META_FUNC(clamp) for
isReducedFloatingType dtypes (float16, bfloat16, float8 variants). The meta
function runs before kernel dispatch on all devices, so the check fires
consistently whether the tensor is on CPU, CUDA, or ROCm. The existing
c10::Scalar::to<T>() mechanism is reused so the error message matches the
one already produced by CPU kernels.

Fixes pytorch#171356

Test Plan:
```
cd /tmp
source /path/to/.venv-src/bin/activate
python3 test/test_shape_ops.py \
    TestShapeOpsCPU.test_clamp_float16_scalar_overflow_cpu \
    TestShapeOpsCUDA.test_clamp_float16_scalar_overflow_cuda \
    -v
```
All tests pass. Verified on AMD Radeon AI PRO R9700 (gfx1201, ROCm 7.0).
bfloat16 with max=65507 correctly does not raise (65507 is representable in bf16).

Authored by Claude.
@brijrajk

Copy link
Copy Markdown
Contributor Author

Rebased on latest main. @jbschlosser — you've recently touched TensorCompare.cpp so tagging you as the likely reviewer. This fixes a silent inconsistency where torch.clamp with an overflowing float16 scalar raises on CPU but silently wraps on CUDA/XPU. The fix lives in the meta function so it's backend-agnostic. Would you be able to take a look and approve CI workflows?

@brijrajk brijrajk force-pushed the fix/clip-float16-overflow-171356 branch from 8d40819 to 15d8663 Compare June 22, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch.clip has checks for float16 scalar overflow on CPU but not on GPU

3 participants