FractalSortCPU: 6x faster bandwidth-efficient radix sort — potential for MergeTree and ORDER BY #104899
Replies: 7 comments 5 replies
-
|
you can do a benchmark of sort 1,10,20 columns chdb vs duckdb vs FractalSortCPU in python |
Beta Was this translation helpful? Give feedback.
-
|
nice job.
Could you also attach the test scripts? BTW, you can use WSL to run chdb on windows.
|
Beta Was this translation helpful? Give feedback.
-
|
I modified the python code slightly to let duckdb run sql on with native format, and let polars use sql interface. def sort_duckdb(table, n_cols):
con = duckdb.connect()
col_names = [f"c{i}" for i in range(n_cols)]
col_exprs = ", ".join(col_names)
import pyarrow as pa
arrow_table = pa.table({k: pa.array(v) for k, v in table.items()})
con.register("t", arrow_table)
con.sql(f"create table duck as SELECT {col_exprs} FROM t")
def run():
con.sql(f"SELECT {col_exprs} FROM duck ORDER BY c0").arrow() #fetchnumpy()
return run
def sort_polars(table, n_cols):
df = pl.DataFrame({k: pl.Series(k, v) for k, v in table.items()})
ctx = pl.SQLContext(t=df, eager=True)
col_names = [f"c{i}" for i in range(n_cols)]
col_exprs = ", ".join(col_names)
def run():
#df.sort("c0")
ctx.execute(f"SELECT {col_exprs} FROM t ORDER BY c0")
return runthe result is how to run FractalSortCPU's sort method on pa.table? |
Beta Was this translation helpful? Give feedback.
-
|
I save your repo at C:\d\fractalsort_cpu-master2, and add your code from frmw_io_fast import fractalsort_fast
def sort_fractalsort(table, n_cols):
keys = table["c0"]
# Warmup JIT
fractalsort_fast(np.random.randint(0, 2**32, size=4096, dtype=np.uint32))
if n_cols == 1:
def run():
fractalsort_fast(keys)
else:
col_arrays = [table[f"c{i}"] for i in range(n_cols)]
def run():
sorted_keys = fractalsort_fast(keys)
order = np.argsort(sorted_keys, kind='mergesort')
for arr in col_arrays:
_ = arr[order]
return runto bench_multicolumn_compare.py and it reports |
Beta Was this translation helpful? Give feedback.
-
|
I modified your code to from fractalsort_cpu import fractalsort
def sort_fractalsort(table, n_cols):
keys = table["c0"]
# Warmup JIT
fractalsort(np.random.randint(0, 2**32, size=4096, dtype=np.uint32))
import pyarrow as pa
arrow_table = pa.table({k: pa.array(v) for k, v in table.items()})
keys = arrow_table.column("c0").to_numpy().astype(np.uint32)
if n_cols == 1:
def run():
fractalsort(keys)
else:
col_arrays = [table[f"c{i}"] for i in range(n_cols)]
def run():
sorted_keys = fractalsort(keys)
order = np.argsort(sorted_keys, kind='mergesort')
for arr in col_arrays:
_ = arr[order]
return runand add |
Beta Was this translation helpful? Give feedback.
-
|
thank you. all engine works except chdb my env |
Beta Was this translation helpful? Give feedback.
-
|
windows WSL I wonder why the WSL performance of many engines are better than those of windows. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi ClickHouse community,
I'm the author of FractalSortCPU (https://doi.org/10.48550/arXiv.2605.10390), a bandwidth-efficient compressed radix sort that achieves up to 6x improvement over state-of-the-art CPU sorting, 3x over GPU, and 2.5x over FPGA implementations on datasets from 512MB to 1TB+.
Since ClickHouse's MergeTree engine relies heavily on sorted data — during inserts, background merges, and ORDER BY execution — I'm exploring whether FractalSortCPU could improve performance in sort-bound workloads.
Key properties:
Paper: https://doi.org/10.48550/arXiv.2605.10390
Code: https://github.com/mikdangana/fractalsort_cpu
Questions for the community:
integration?
Happy to run comparative benchmarks if there's interest.
Michael Dang'ana
Eonforge Labs
michael@eonforge.ca
Beta Was this translation helpful? Give feedback.
All reactions