Honest numbers · HNSW · dedicated NVMe · reproducible with pgvector-bench

Managed pgvector benchmarks.
QPS with recall. Not marketing.

Real HNSW search numbers across all three dedicated tiers on local NVMe — measured per tier, per dataset size, per recall target, and per concurrency level. Every QPS figure on this page states its recall@10 and its concurrency, because without those two numbers a throughput claim means nothing. All results are reproducible with the open-source pgvector-bench CLI.

Key numbers

One representative operating point per tier. Each card shows the recommend recall-0.93 setting where available, or the largest-dataset row for tiers where a bigger index was measured. The p50/p95/p99 triple and the recall@k are always included — they are what the benchmark actually measures.

Solo · 2 vCPU / 4 GB

$15/mo · 250k × 1536

recall 0.93

~999

QPS · 4 clients · ef_search=80

3.7 ms

p50

5.9 ms

p95

7.1 ms

p99

Growth · 4 vCPU / 8 GB

$59/mo · 500k × 1536

recall 0.88

~1,797

QPS · 16 clients · ef_search=120

8.7 ms

p50

p95

p99

Scale · 8 vCPU / 16 GB

$99/mo · 1M × 1536

recall 0.75

~2,504

QPS · 16 clients · ef_search=80

5.9 ms

p50

p95

p99

Full results table

All measured (tier, dataset, ef_search, recall@10, concurrency) combinations. Cells shown as "—" were not captured for that configuration. The Solo row at ef_search=200 / recall 0.99 is the known collapse point where the working set spills past the 4 GB cache and latency surges to 43 ms p50.

TierVectorsef_searchrecall@10clientsQPSp50p95p99
Solo250k × 1536100.7541,300 2.9 ms4.3 ms5.2 ms
Solo250k × 1536100.75161,980 8.0 ms11.2 ms
Solo250k × 1536800.934999 3.7 ms5.9 ms7.1 ms
Solo250k × 1536800.93161,568 10.0 ms
Solo250k × 15362000.9916299 cache spill — disk-bound collapse43 ms
Growth250k × 15361200.9381,280 5.6 ms
Growth250k × 15361200.93161,716 8.5 ms
Growth500k × 15361200.8881,380 5.6 ms
Growth500k × 15361200.88161,797 8.7 ms
Scale250k × 15361200.9381,458 5.4 ms
Scale250k × 15361200.93162,468 6.3 ms
Scale1M × 1536800.75162,504 5.9 ms
Scale1M × 15362000.85161,953 8.0 ms
HNSW m=16 · ef_construction=64 · cosine distance · same-region client via TLS · Hetzner EU Falkenstein · PostgreSQL 17.10 · pgvector 0.8.2 · synthetic clustered 1536-dim vectors (Gaussian mixture, cosine-normalized) · recall@10 vs brute-force KNN ground truth

Build ceiling

HNSW index builds are memory-bound. Each tier has a hard size limit beyond which the build takes hours or fails entirely. The throughput numbers above are irrelevant if the index will not build. Size the node by index build memory first, QPS second.

Tier (RAM)Builds fineBuilds, but slow / unusableWill not build
Solo — 4 GB250k (~11 min)500k (~4.2 h, memory-starved)1M (OOM)
Growth — 8 GB250k (~90 s), 500k (~40 min)1M (OOM)
Scale — 16 GB250k (~60 s), 1M (~30 min)2M (OOM)
HNSW m=16 · ef_construction=64 · 1536-dim vectors · a 1M × 1536 index requires approximately 6 GB of build memory

Methodology

Exactly what was measured and how — for anyone who wants to understand or reproduce it.

Client and network path

The benchmark client ran on a separate VM in the same region (Hetzner EU, Falkenstein), connecting over the public endpoint with TLS (sslmode=require). That is exactly the path a same-region application takes. On Solo the client connects through PgBouncer (port 6432, transaction pooling); on Growth and Scale through the load balancer (port 5432). Bare SELECT 1 round-trip overhead from the same region is about 0.4 ms on Solo and 0.9 ms on Growth/Scale. Cross-region latency would add roughly 90 ms per query and would dominate all these numbers.

Dataset and index

Vectors are synthetic but clustered: a 1536-dimensional Gaussian mixture, normalized for cosine distance. A clustered mixture is used rather than uniform random noise so that recall is representative of real embedding structure — real embeddings may recall slightly higher. The index is HNSW with m=16 and ef_construction=64 (the pgvector defaults for 1536-dim). Recall@10 is measured against exact brute-force KNN ground truth computed with index scans disabled.

Why recall and concurrency are non-negotiable

Approximate nearest-neighbor search trades recall for speed. The hnsw.ef_search parameter controls the tradeoff. A QPS number without a recall number and a concurrency count is an advertisement, not a measurement. Every result on this page is a (tier, dataset, ef_search, recall@10, concurrency, QPS, latency) tuple. A single headline number without those qualifiers tells you nothing about what the system actually delivers for your workload.

Reproducibility

All numbers were produced with the open-source pgvector-bench CLI, released under the MIT license and available at github.com/Rivestack/pgvector-bench. A single Go binary with an interactive wizard and a self-contained HTML report. Run it against any PostgreSQL instance — including a Rivestack free tier — to measure your own p50/p95/p99, QPS curve, and recall@k against brute-force ground truth on your actual data.

Related: NVMe vs cloud SSD benchmarks (full blog post) — the source of every number on this page, including the latency curves, the ef_search=200 collapse, and per-tier analysis. Run the benchmarks yourself with the pgvector-bench CLI. Try managed pgvector against a free tier to measure your own index.

Frequently asked questions

Rivestack

Stop overpaying for
pgvector you don't control.

Free tier with pgvector ready in 60 seconds. Or send us your current setup and we'll tell you in 48 hours whether Rivestack is cheaper, faster, and less painful than what you have today.

Free tier · No credit card · pgvector ready in 60 seconds.