Apache Iceberg version
None
Query engine
None
Please describe the bug 🐞
The iceberg-rest-fixture REST catalog server uses JdbcCatalog and specifically sqlite, jdbc:sqlite::memory: as the underlying database.
Under high concurrent load, the catalog server errors with SQLiteException such as,
Caused by: org.sqlite.SQLiteException: [SQLITE_ERROR] SQL error or missing database (no such table: iceberg_tables)
This issue arises because jdbc:sqlite::memory: creates a distinct, ephemeral in-memory database for each new JDBC connection. The JdbcClientPool defaults to client_pool size of 2.
When managing more than 2 connections, the JdbcClientPool can create new connections and thus new in-memory new database, without running the initialization logic. Subsequent db operations will fail due to missing tables in these newly created database.
Reproduce
Start a IRC server using iceberg-rest-fixture. I'm using pyiceberg's integration docker compose file by running
make test-integration-setup
in the pyiceberg repo.
Create concurrent requests to the IRC. In poetry run python, run this script
from pyiceberg.catalog import load_catalog
from concurrent.futures import ThreadPoolExecutor
def run_rest_catalog():
catalog = load_catalog(
"rest",
**{
"type": "rest",
"uri": "http://localhost:8181",
"s3.endpoint": "http://localhost:9000",
"s3.access-key-id": "admin",
"s3.secret-access-key": "password",
},
)
try:
ns = catalog.list_namespaces()
print(ns)
except Exception as e:
print(e)
def run_catalog_in_parallel(workers, runs):
# Run in parallel using threads
with ThreadPoolExecutor(max_workers=workers) as executor:
futures = [executor.submit(run_rest_catalog) for _ in range(runs)]
results = {}
for future in futures:
_ = future.result()
run_catalog_in_parallel(20, 200)
Solution
In iceberg-rest-fixture's Dockerfile, set
CATALOG_URI=jdbc:sqlite:file::memory:?cache=shared
- Use
file::memory: to explicitly indicate an in-memory database that supports URI parameters.
- Add the
?cache=shared parameter to allow SQLite to use a shared in-memory cache so that all connections will access the same underlying in-memory database.
EDIT: Fokko suggested a better solution by limiting sqlite to just 1 client connection
Additionally, we should replace all subproject using the original jdbc:sqlite::memory: connection string.
Willingness to contribute
Apache Iceberg version
None
Query engine
None
Please describe the bug 🐞
The
iceberg-rest-fixtureREST catalog server usesJdbcCatalogand specifically sqlite,jdbc:sqlite::memory:as the underlying database.Under high concurrent load, the catalog server errors with
SQLiteExceptionsuch as,This issue arises because
jdbc:sqlite::memory:creates a distinct, ephemeral in-memory database for each new JDBC connection. TheJdbcClientPooldefaults toclient_poolsize of 2.When managing more than 2 connections, the
JdbcClientPoolcan create new connections and thus new in-memory new database, without running the initialization logic. Subsequent db operations will fail due to missing tables in these newly created database.Reproduce
Start a IRC server using
iceberg-rest-fixture. I'm using pyiceberg's integration docker compose file by runningin the pyiceberg repo.
Create concurrent requests to the IRC. In
poetry run python, run this scriptSolution
In iceberg-rest-fixture's Dockerfile, set
CATALOG_URI=jdbc:sqlite:file::memory:?cache=sharedfile::memory:to explicitly indicate an in-memory database that supports URI parameters.?cache=sharedparameter to allow SQLite to use a shared in-memory cache so that all connections will access the same underlying in-memory database.EDIT: Fokko suggested a better solution by limiting sqlite to just 1 client connection
Additionally, we should replace all subproject using the original
jdbc:sqlite::memory:connection string.Willingness to contribute