Skip to content

feat: migrate backend from per-device kernels to InfiniOps operator library#304

Draft
bitzyz wants to merge 11 commits into
masterfrom
dev-adapt-infiniops
Draft

feat: migrate backend from per-device kernels to InfiniOps operator library#304
bitzyz wants to merge 11 commits into
masterfrom
dev-adapt-infiniops

Conversation

@bitzyz

@bitzyz bitzyz commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

This commit replaces InfiniTensor's per-device kernel implementations with InfiniOps, a unified operator library shared across the InfiniTensor ecosystem. This is the first step in a multi-part migration.

  • Add InfiniOps as a git submodule at 3rd-party/infiniops, version-pinned via .gitmodules. The submodule provides kernel implementations for CPU, CUDA, and other accelerators through a single libinfiniops.so.

  • Introduce the InfiniOps bridge layer (include/core/infiniops_bridge/):

adapter_kernel.h — base class for adapter kernels that delegate to InfiniOps C++ APIs.

tensor_convert.h — converts InfiniTensor TensorObj (shape, dtype, strides, device) to InfiniOps Tensor as a non-owning view.

infiniops_runtime.h / infiniops_runtime.cc — unified InfiniOpsRuntimeObj that replaces all per-device runtime classes (CudaRuntimeObj, BangRuntimeObj, KunlunRuntimeObj, etc.) with device-specific alloc/dealloc dispatching and workspace management.

cpu_fallback.h — placeholder for future GPU→CPU fallback path for operators not yet supported by InfiniOps on GPU.

  • Implement 6 InfiniOps adapter kernels (src/kernels/infiniops/): Add, Mul, MatMul (with bias fusion via Gemm beta=1), Cast, Concat, and RMSNorm. Each adapter converts InfiniTensor tensors to InfiniOps tensors and calls the corresponding infini::ops::*::Call().

  • Remove legacy per-device kernel implementations for CUDA, BANG, Kunlun, Ascend, and IntelCPU backends (~24,700 lines deleted). This includes all kernel source files, runtime classes, operator timers, and device-specific headers. The remaining CPU kernels (src/kernels/cpu/) serve as fallback for operators not yet covered by InfiniOps adapters.

  • Simplify the build system: remove USE_CUDA, USE_BANG, USE_KUNLUN, USE_ASCEND, USE_INTELCPU CMake options. InfiniOps device backends are now controlled through InfiniOps's own WITH_CPU/WITH_NVIDIA etc. flags. GPU support is re-enabled via -DWITH_NVIDIA=ON passed through to InfiniOps's add_subdirectory.

  • Update the Python FFI (src/ffi/ffi_infinitensor.cc): replace all device-specific runtime factory functions (cuda_runtime(), bang_runtime(), etc.) with a unified cpu_runtime() / cuda_runtime() that returns InfiniOpsRuntimeObj. Fix copyout_numpy to use numpy.empty() instead of py::array(dtype, shape, nullptr) to avoid NumPy 2.x stride issues.

@bitzyz bitzyz self-assigned this Apr 29, 2026
@bitzyz bitzyz marked this pull request as draft April 29, 2026 01:58
@bitzyz bitzyz force-pushed the dev-adapt-infiniops branch from 70e886d to 15cbfab Compare April 30, 2026 02:25
@bitzyz bitzyz force-pushed the dev-adapt-infiniops branch from 15cbfab to a2575d1 Compare April 30, 2026 02:53
@bitzyz bitzyz force-pushed the dev-adapt-infiniops branch 4 times, most recently from 952e424 to 889b1ad Compare May 7, 2026 03:39
@bitzyz bitzyz force-pushed the dev-adapt-infiniops branch 2 times, most recently from 9ff0681 to 49a1769 Compare May 8, 2026 01:59
@bitzyz bitzyz force-pushed the dev-adapt-infiniops branch from 49a1769 to c11b79b Compare May 8, 2026 06:18
@bitzyz bitzyz force-pushed the dev-adapt-infiniops branch from 0b0c969 to 0d67dc3 Compare May 18, 2026 02:52
@bitzyz bitzyz force-pushed the dev-adapt-infiniops branch from 0d67dc3 to 70882d2 Compare May 20, 2026 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant