FractalSortCPU: 6x faster bandwidth-efficient radix sort — potential for MergeTree and ORDER BY #104899

mikdangana · 2026-05-13T23:32:20Z

mikdangana
May 13, 2026

Hi ClickHouse community,

I'm the author of FractalSortCPU (https://doi.org/10.48550/arXiv.2605.10390), a bandwidth-efficient compressed radix sort that achieves up to 6x improvement over state-of-the-art CPU sorting, 3x over GPU, and 2.5x over FPGA implementations on datasets from 512MB to 1TB+.

Since ClickHouse's MergeTree engine relies heavily on sorted data — during inserts, background merges, and ORDER BY execution — I'm exploring whether FractalSortCPU could improve performance in sort-bound workloads.

Key properties:

CPU-adapted histogram compression for arbitrary-precision keys
Fully parallel key-based histogram updates (no input bucketing needed)
SIMD-accelerated, designed for modern CPU architectures
1.3x-2.2x faster throughput than standard radix sort across all dataset sizes tested

Paper: https://doi.org/10.48550/arXiv.2605.10390
Code: https://github.com/mikdangana/fractalsort_cpu

Questions for the community:

Where are the main sort bottlenecks in ClickHouse today — merge operations, ORDER BY, or elsewhere?
Would there be interest in benchmarking FractalSortCPU against ClickHouse's current sort paths?
What would be the best way to prototype this — a standalone benchmark against ClickHouse's sort, or a deeper
integration?

Happy to run comparative benchmarks if there's interest.

Michael Dang'ana
Eonforge Labs
michael@eonforge.ca

l1t1 · 2026-05-19T12:17:33Z

l1t1
May 19, 2026

you can do a benchmark of sort 1,10,20 columns chdb vs duckdb vs FractalSortCPU in python

1 reply

mikdangana May 22, 2026
Author

Hi,

Here is a multicolumn benchmark DuckDB vs FractalSort (FSF) (over 10x speed-up) and state-of-the-art Polars with:

==========================================================================================
Multi-column sort benchmark: sort by first column (uint32)

NOTE: chdb not available on this platform ù skipped.

  rows cols |       DuckDB       Polars         FSC8    FSC8(key)          FSF     FSF(key) | best

======================================================================
Speedup vs DuckDB (higher = faster than DuckDB)

  rows cols |       Polars         FSC8    FSC8(key)          FSF     FSF(key)

100,000 1 | 10.25x 8.84x 7.62x 7.31x 7.52x
100,000 10 | 8.71x 2.00x 13.26x 2.96x 14.91x
100,000 20 | 9.99x 2.44x 18.45x 6.80x 20.71x

1,000,000 1 | 7.15x 4.74x 4.73x 8.80x 8.76x
1,000,000 10 | 3.36x 1.08x 8.36x 4.74x 15.83x
1,000,000 20 | 2.82x 1.25x 12.06x 4.61x 23.10x

10,000,000 1 | 9.11x 6.55x 6.71x 10.52x 10.39x
10,000,000 10 | 1.22x 0.34x 5.33x 2.89x 19.54x
10,000,000 20 | 0.99x 0.41x 19.52x 6.33x 30.55x

Environment:
Platform: win32
Python: 3.14.0
DuckDB: 1.5.3
Polars: 1.40.1
chdb: not available
Numba: 0.65.1
NumPy: 2.3.5
n_runs: 5

NB: Claude couldn't get chdb to work on my Windows laptop since there is no available binary, so it excluded it from the benchmark.

l1t1 · 2026-05-23T01:09:56Z

l1t1
May 23, 2026

nice job. Could you also attach the test scripts? BTW, you can use WSL to run chdb on windows.

1 reply

mikdangana May 26, 2026
Author

Thanks @l1t1 — I've pushed a benchmark script to the repo that tests DuckDB and Polars (chdb auto-skips if unavailable). FractalSortCPU results are included as reference values since the sort implementation isn't part of this script.

You can run it directly:
pip install duckdb polars pyarrow
python bench_multicolumn_compare.py
https://github.com/mikdangana/fractalsort_cpu/blob/master/bench_multicolumn_compare.py

I'll look into getting chdb running under WSL as well.

On the integration side — if there's interest in evaluating FractalSortCPU against ClickHouse's internal sort paths (MergeTree inserts, merges, ORDER BY), I'd be happy to discuss a structured proof-of-concept. We're set up for that kind of engagement through https://eonforge.ca. Feel free to reach out directly if that's of interest.

l1t1 · 2026-05-27T11:08:49Z

l1t1
May 27, 2026

I modified the python code slightly to let duckdb run sql on with native format, and let polars use sql interface.

def sort_duckdb(table, n_cols):
    con = duckdb.connect()
    col_names = [f"c{i}" for i in range(n_cols)]
    col_exprs = ", ".join(col_names)
    import pyarrow as pa
    arrow_table = pa.table({k: pa.array(v) for k, v in table.items()})
    con.register("t", arrow_table)
    con.sql(f"create table duck as SELECT {col_exprs} FROM t")
    def run():
        con.sql(f"SELECT {col_exprs} FROM duck ORDER BY c0").arrow() #fetchnumpy()

    return run


def sort_polars(table, n_cols):
    df = pl.DataFrame({k: pl.Series(k, v) for k, v in table.items()})
    ctx = pl.SQLContext(t=df, eager=True)
    col_names = [f"c{i}" for i in range(n_cols)]
    col_exprs = ", ".join(col_names)
    def run():
        #df.sort("c0")
        ctx.execute(f"SELECT {col_exprs} FROM t ORDER BY c0")

    return run

the result is

C:\d>python bench.py
==========================================================================================
Multi-column sort benchmark: sort by first column (uint32)
==========================================================================================
NOTE: chdb not available on this platform — skipped.

      rows cols |       DuckDB       Polars    FSC8(ref) | best
---------------------------------------------------------------
   100,000    1 |      31.3M/s      134.7M/s*      54.8M/s  | Polars
   100,000   10 |       9.5M/s       65.1M/s*      56.2M/s  | Polars
   100,000   20 |       5.2M/s       49.3M/s       54.1M/s* | FSC8(ref)

 1,000,000    1 |      53.0M/s      252.7M/s*      41.9M/s  | Polars
 1,000,000   10 |      18.0M/s       62.2M/s       77.4M/s* | FSC8(ref)
 1,000,000   20 |      10.4M/s       32.5M/s       53.8M/s* | FSC8(ref)

10,000,000    1 |      59.0M/s      323.2M/s*      61.3M/s  | Polars
10,000,000   10 |      19.8M/s       37.8M/s       66.9M/s* | FSC8(ref)
10,000,000   20 |       9.4M/s       22.8M/s       64.7M/s* | FSC8(ref)


======================================================================
Speedup vs DuckDB (higher = faster than DuckDB)
======================================================================
      rows cols |       Polars    FSC8(ref)
-------------------------------------------
   100,000    1 |       4.31x        1.75x
   100,000   10 |       6.83x        5.90x
   100,000   20 |       9.51x       10.44x

 1,000,000    1 |       4.77x        0.79x
 1,000,000   10 |       3.45x        4.29x
 1,000,000   20 |       3.14x        5.18x

10,000,000    1 |       5.48x        1.04x
10,000,000   10 |       1.91x        3.39x
10,000,000   20 |       2.42x        6.87x


Environment:
  Platform:    win32
  Python:      3.13.2
  DuckDB:      1.3.2
  Polars:      1.31.0
  chdb:        not available
  NumPy:       2.2.4
  n_runs:      3

FSC8(ref) = FractalSortCPU key-only sort (pre-measured reference values)

how to run FractalSortCPU's sort method on pa.table?

1 reply

mikdangana May 29, 2026
Author

Thanks for running the benchmark! A couple of notes:

Running FractalSortCPU on a table:

FractalSortCPU sorts the key column natively, then uses np.argsort on the sorted output to build a permutation index
for reordering payload columns:

  from frmw_io_fast import fractalsort_fast

  def sort_fractalsort(table, n_cols):
      keys = table["c0"]
      # Warmup JIT
      fractalsort_fast(np.random.randint(0, 2**32, size=4096, dtype=np.uint32))

      if n_cols == 1:
          def run():
              fractalsort_fast(keys)
      else:
          col_arrays = [table[f"c{i}"] for i in range(n_cols)]
          def run():
              sorted_keys = fractalsort_fast(keys)
              order = np.argsort(sorted_keys, kind='mergesort')
              for arr in col_arrays:
                  _ = arr[order]

      return run

For PyArrow tables specifically, you'd extract the column as numpy first:
keys = arrow_table.column("c0").to_numpy().astype(np.uint32)

Re: your results — the FSC8(ref) numbers in your output are pre-measured reference values, not live. To run live,
clone the repo and use bench_multicolumn_compare.py which runs all engines head-to-head including FSC8, FSF (our
faster 16-bit pipeline), and the adaptive dispatcher.

Our latest numbers on 10M rows with the FSF pipeline:

Engine	1 col	10 cols	20 cols
DuckDB	9.8 M/s	3.4 M/s	2.5 M/s
Polars	95.1 M/s	9.4 M/s	4.4 M/s
FSF(key)	128.0 M/s	123.1 M/s	126.0 M/s

The key advantage is that FractalSort throughput stays flat as column count increases, while SQL engines degrade
because they physically move all columns during sorting.

l1t1 · 2026-05-30T11:31:25Z

l1t1
May 30, 2026

I save your repo at C:\d\fractalsort_cpu-master2, and add your code

  from frmw_io_fast import fractalsort_fast

  def sort_fractalsort(table, n_cols):
      keys = table["c0"]
      # Warmup JIT
      fractalsort_fast(np.random.randint(0, 2**32, size=4096, dtype=np.uint32))

      if n_cols == 1:
          def run():
              fractalsort_fast(keys)
      else:
          col_arrays = [table[f"c{i}"] for i in range(n_cols)]
          def run():
              sorted_keys = fractalsort_fast(keys)
              order = np.argsort(sorted_keys, kind='mergesort')
              for arr in col_arrays:
                  _ = arr[order]

      return run

to bench_multicolumn_compare.py and it reports

    from frmw_io_fast import fractalsort_fast
ImportError: cannot import name 'fractalsort_fast' from 'frmw_io_fast' (C:\d\fractalsort_cpu-master2\frmw_io_fast.py)

0 replies

l1t1 · 2026-05-30T12:30:04Z

l1t1
May 30, 2026

I modified your code to

from fractalsort_cpu import fractalsort
def sort_fractalsort(table, n_cols):
  keys = table["c0"]
  # Warmup JIT
  fractalsort(np.random.randint(0, 2**32, size=4096, dtype=np.uint32))
  import pyarrow as pa
  arrow_table = pa.table({k: pa.array(v) for k, v in table.items()})
  keys = arrow_table.column("c0").to_numpy().astype(np.uint32)
  if n_cols == 1:
      def run():
          fractalsort(keys)
  else:
      col_arrays = [table[f"c{i}"] for i in range(n_cols)]
      def run():
          sorted_keys = fractalsort(keys)
          order = np.argsort(sorted_keys, kind='mergesort')
          for arr in col_arrays:
              _ = arr[order]

  return run

and add engines.append(("FractalSort", sort_fractalsort)) and fsc8_ref =None to main().
it works, but very slow. what's wrong with my code?


      rows cols |       DuckDB       Polars  FractalSort    FSC8(ref) | best
----------------------------------------------------------------------------
   100,000    1 |      14.5M/s      101.4M/s      169.7M/s*          N/A | FractalSort
   100,000   10 |       9.3M/s       58.1M/s*       0.6M/s           N/A | Polars
   100,000   20 |       8.5M/s       45.2M/s*       0.6M/s           N/A | Polars

 1,000,000    1 |      72.2M/s      261.5M/s*     170.0M/s           N/A | Polars
 1,000,000   10 |      42.8M/s       88.6M/s*       0.6M/s           N/A | Polars
 1,000,000   20 |      33.6M/s*      32.0M/s        0.6M/s           N/A | DuckDB

10,000,000    1 |     130.2M/s      341.8M/s*     265.6M/s           N/A | Polars
10,000,000   10 |      59.9M/s*      39.4M/s        0.5M/s           N/A | DuckDB
10,000,000   20 |      30.7M/s*      19.2M/s        0.5M/s           N/A | DuckDB


======================================================================
Speedup vs DuckDB (higher = faster than DuckDB)
======================================================================
      rows cols |       Polars  FractalSort    FSC8(ref)
--------------------------------------------------------
   100,000    1 |       6.97x       11.67x           N/A
   100,000   10 |       6.22x        0.06x           N/A
   100,000   20 |       5.32x        0.07x           N/A

 1,000,000    1 |       3.62x        2.35x           N/A
 1,000,000   10 |       2.07x        0.01x           N/A
 1,000,000   20 |       0.95x        0.02x           N/A

10,000,000    1 |       2.62x        2.04x           N/A
10,000,000   10 |       0.66x        0.01x           N/A
10,000,000   20 |       0.63x        0.02x           N/A


Environment:
  Platform:    win32
  Python:      3.13.2
  DuckDB:      1.5.2
  Polars:      1.40.0
  chdb:        not available
  NumPy:       2.4.5
  n_runs:      3

1 reply

mikdangana May 30, 2026
Author

Hi,

Thanks for looking into it. I've reverted to a more stable previous version. Will save the fastest version for a future commit. Make sure you have the latest source code and benchmark. Run the benchmark with python bench_multicolumn.py

python bench_multicolumn.py
==========================================================================================
Multi-column sort benchmark: sort by first column (uint32)
==========================================================================================
NOTE: chdb not available on this platform ù skipped.

      rows cols |       DuckDB       Polars  Polars(key)         FSC8    FSC8(key)          FSF     FSF(key) | best
-------------------------------------------------------------------------------------------------------------------
   100,000    1 |       6.1M/s       58.9M/s*      52.2M/s       51.6M/s       44.9M/s       48.9M/s       49.7M/s  | Polars
   100,000   10 |       2.9M/s       26.5M/s       44.4M/s        5.9M/s       41.7M/s        9.2M/s       51.4M/s* | FSF(key)
   100,000   20 |       2.1M/s       21.5M/s       49.7M/s        4.9M/s       41.2M/s        7.2M/s       52.1M/s* | FSF(key)

 1,000,000    1 |      10.0M/s      106.9M/s       65.4M/s       65.9M/s       45.8M/s      110.5M/s*     108.0M/s  | FSF
 1,000,000   10 |       5.3M/s       18.9M/s       57.7M/s        5.7M/s       45.4M/s       27.1M/s      108.2M/s* | FSF(key)
 1,000,000   20 |       3.5M/s       11.8M/s       61.9M/s        4.5M/s       46.0M/s       18.0M/s      107.0M/s* | FSF(key)

10,000,000    1 |       9.7M/s       93.7M/s       59.5M/s       67.0M/s       65.9M/s      131.4M/s      136.8M/s* | FSF(key)
10,000,000   10 |       4.1M/s       10.1M/s       61.4M/s        1.6M/s       66.7M/s       35.5M/s      125.8M/s* | FSF(key)
10,000,000   20 |       3.1M/s        6.4M/s       58.4M/s        1.3M/s       66.6M/s       21.7M/s      128.7M/s* | FSF(key)


======================================================================
Speedup vs DuckDB (higher = faster than DuckDB)
======================================================================
      rows cols |       Polars  Polars(key)         FSC8    FSC8(key)          FSF     FSF(key)
-----------------------------------------------------------------------------------------------
   100,000    1 |       9.73x        8.63x        8.53x        7.42x        8.09x        8.22x
   100,000   10 |       9.16x       15.34x        2.02x       14.40x        3.16x       17.76x
   100,000   20 |      10.03x       23.21x        2.27x       19.25x        3.37x       24.33x

 1,000,000    1 |      10.67x        6.52x        6.57x        4.57x       11.02x       10.78x
 1,000,000   10 |       3.58x       10.93x        1.08x        8.61x        5.14x       20.49x
 1,000,000   20 |       3.33x       17.51x        1.28x       13.03x        5.10x       30.29x

10,000,000    1 |       9.61x        6.11x        6.87x        6.76x       13.48x       14.03x
10,000,000   10 |       2.44x       14.89x        0.40x       16.16x        8.60x       30.49x
10,000,000   20 |       2.07x       18.92x        0.41x       21.59x        7.03x       41.73x


Environment:
  Platform:    win32
  Python:      3.14.0
  DuckDB:      1.5.3
  Polars:      1.40.1
  chdb:        not available
  Numba:       0.65.1
  NumPy:       2.3.5
  n_runs:      5

l1t1 · 2026-05-31T04:16:15Z

l1t1
May 31, 2026

thank you. all engine works except chdb
I tried chdb on WSL, it reports

  WARN: chdb failed for 100000x1: Code: 801. DB::Exception: Python object not found in the Python environment
Ensure that the object is type of PyReader, pandas DataFrame, or PyArrow Table and is in the global or local scope. (PY_OBJECT_NOT_FOUND)

my env

python --version
Python 3.14.5
pip list
Package           Version
----------------- -----------
chdb              4.1.8
chdb-core         26.3.0
duckdb            1.5.3
llvmlite          0.47.0
numba             0.65.1
numpy             2.4.6
pandas            3.0.3
pip               26.1
polars            1.41.2
polars-runtime-32 1.41.2
pyarrow           24.0.0
python-dateutil   2.9.0.post0
six               1.17.0

0 replies

l1t1 · 2026-05-31T04:56:53Z

l1t1
May 31, 2026

windows

python bench_multicolumn.py
==========================================================================================
Multi-column sort benchmark: sort by first column (uint32)
==========================================================================================
NOTE: chdb not available on this platform — skipped.

      rows cols |       DuckDB       Polars  Polars(key)         FSC8    FSC8(key)          FSF     FSF(key) | best
-------------------------------------------------------------------------------------------------------------------
   100,000    1 |      12.7M/s      149.1M/s*      93.4M/s      110.1M/s      109.1M/s      138.1M/s      131.8M/s  | Polars
   100,000   10 |       6.8M/s       75.3M/s       98.0M/s       13.2M/s      109.6M/s*      18.4M/s       84.5M/s  | FSC8(key)
   100,000   20 |       4.4M/s       56.8M/s       97.3M/s       11.4M/s      109.7M/s       17.3M/s      144.2M/s* | FSF(key)

 1,000,000    1 |      15.8M/s      264.2M/s*     243.6M/s      121.5M/s      118.1M/s      220.4M/s      190.6M/s  | Polars
 1,000,000   10 |       9.2M/s       93.1M/s      233.9M/s        9.5M/s      118.9M/s       64.7M/s      249.5M/s* | FSF(key)
 1,000,000   20 |       6.5M/s       33.4M/s      239.2M/s*       7.2M/s      120.2M/s       39.0M/s      233.2M/s  | Polars(key)

10,000,000    1 |      13.8M/s      341.5M/s*     129.1M/s       96.1M/s       96.1M/s       64.7M/s       81.9M/s  | Polars
10,000,000   10 |       8.1M/s       42.9M/s      131.7M/s        5.8M/s      100.5M/s       55.5M/s      182.1M/s* | FSF(key)
10,000,000   20 |       4.4M/s       20.3M/s      130.2M/s        4.1M/s      100.7M/s       35.6M/s      182.9M/s* | FSF(key)


======================================================================
Speedup vs DuckDB (higher = faster than DuckDB)
======================================================================
      rows cols |       Polars  Polars(key)         FSC8    FSC8(key)          FSF     FSF(key)
-----------------------------------------------------------------------------------------------
   100,000    1 |      11.75x        7.36x        8.67x        8.59x       10.88x       10.38x
   100,000   10 |      11.04x       14.35x        1.94x       16.06x        2.69x       12.37x
   100,000   20 |      13.06x       22.35x        2.63x       25.22x        3.98x       33.14x

 1,000,000    1 |      16.74x       15.44x        7.70x        7.48x       13.97x       12.08x
 1,000,000   10 |      10.14x       25.46x        1.04x       12.94x        7.04x       27.15x
 1,000,000   20 |       5.14x       36.86x        1.11x       18.53x        6.02x       35.93x

10,000,000    1 |      24.79x        9.37x        6.97x        6.97x        4.70x        5.94x
10,000,000   10 |       5.32x       16.33x        0.73x       12.45x        6.87x       22.58x
10,000,000   20 |       4.60x       29.44x        0.93x       22.78x        8.05x       41.38x


Environment:
  Platform:    win32
  Python:      3.13.2
  DuckDB:      1.5.2
  Polars:      1.40.0
  chdb:        not available
  Numba:       0.65.1
  NumPy:       2.4.5
  n_runs:      5

WSL

python bench_multicolumn.py
==========================================================================================
Multi-column sort benchmark: sort by first column (uint32)
==========================================================================================
NOTE: chdb not available on this platform — skipped.

      rows cols |       DuckDB       Polars  Polars(key)         FSC8    FSC8(key)          FSF     FSF(key) | best
-------------------------------------------------------------------------------------------------------------------
   100,000    1 |      16.2M/s       54.2M/s       34.2M/s      116.4M/s      113.6M/s      235.6M/s      239.7M/s* | FSF(key)
   100,000   10 |       6.9M/s       40.7M/s       38.0M/s       14.2M/s      107.7M/s       15.5M/s      223.8M/s* | FSF(key)
   100,000   20 |       4.3M/s       26.1M/s       45.0M/s       11.7M/s      116.1M/s       16.4M/s      233.1M/s* | FSF(key)

 1,000,000    1 |      33.7M/s      227.9M/s      205.9M/s      142.5M/s      140.9M/s      380.4M/s*     355.2M/s  | FSF
 1,000,000   10 |      11.5M/s       73.4M/s      191.1M/s       11.2M/s      140.3M/s       16.4M/s      359.3M/s* | FSF(key)
 1,000,000   20 |       7.8M/s       32.8M/s      202.5M/s        9.2M/s      150.0M/s       12.9M/s      373.3M/s* | FSF(key)

10,000,000    1 |      28.5M/s      327.2M/s*     129.5M/s      116.2M/s      116.6M/s      226.7M/s      228.4M/s  | Polars
10,000,000   10 |       8.0M/s       43.5M/s      119.1M/s        5.6M/s      107.2M/s       61.3M/s      200.3M/s* | FSF(key)
10,000,000   20 |       4.9M/s       23.6M/s      110.3M/s        4.1M/s      109.8M/s       39.2M/s      205.5M/s* | FSF(key)


======================================================================
Speedup vs DuckDB (higher = faster than DuckDB)
======================================================================
      rows cols |       Polars  Polars(key)         FSC8    FSC8(key)          FSF     FSF(key)
-----------------------------------------------------------------------------------------------
   100,000    1 |       3.35x        2.11x        7.18x        7.01x       14.54x       14.79x
   100,000   10 |       5.89x        5.51x        2.06x       15.59x        2.24x       32.41x
   100,000   20 |       6.08x       10.50x        2.72x       27.08x        3.82x       54.35x

 1,000,000    1 |       6.77x        6.11x        4.23x        4.18x       11.30x       10.55x
 1,000,000   10 |       6.41x       16.68x        0.98x       12.25x        1.43x       31.36x
 1,000,000   20 |       4.20x       25.92x        1.18x       19.21x        1.65x       47.79x

10,000,000    1 |      11.47x        4.54x        4.07x        4.09x        7.94x        8.00x
10,000,000   10 |       5.46x       14.93x        0.70x       13.44x        7.69x       25.11x
10,000,000   20 |       4.79x       22.38x        0.84x       22.28x        7.96x       41.70x


Environment:
  Platform:    linux
  Python:      3.14.5rc1
  DuckDB:      1.5.3
  Polars:      1.41.2
  chdb:        not available
  Numba:       0.65.1
  NumPy:       2.4.6
  n_runs:      5

I wonder why the WSL performance of many engines are better than those of windows.

1 reply

mikdangana Jun 1, 2026
Author

There are many variables, so it would be good to run the benchmark multiple times to ensure the impact of OS transient state on results can be mitigated. Numba JIT warmup could also affect the results and would be mitigated in the same way, or by using a dummy run.

Uh oh!

FractalSortCPU: 6x faster bandwidth-efficient radix sort — potential for MergeTree and ORDER BY #104899

Uh oh!

Replies: 7 comments · 5 replies

Uh oh!

Uh oh!

mikdangana May 22, 2026 Author

========================================================================================== Multi-column sort benchmark: sort by first column (uint32)

====================================================================== Speedup vs DuckDB (higher = faster than DuckDB)

Uh oh!

Uh oh!

Uh oh!

mikdangana May 26, 2026 Author

Uh oh!

Uh oh!

Uh oh!

mikdangana May 29, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikdangana May 30, 2026 Author

Uh oh!

Uh oh!

Uh oh!

mikdangana Jun 1, 2026 Author

Replies: 7 comments 5 replies

mikdangana May 22, 2026
Author

==========================================================================================
Multi-column sort benchmark: sort by first column (uint32)

======================================================================
Speedup vs DuckDB (higher = faster than DuckDB)

mikdangana May 26, 2026
Author

mikdangana May 29, 2026
Author

mikdangana May 30, 2026
Author

mikdangana Jun 1, 2026
Author