Real HNSW search numbers across all three dedicated tiers on local NVMe — measured per tier, per dataset size, per recall target, and per concurrency level. Every QPS figure on this page states its recall@10 and its concurrency, because without those two numbers a throughput claim means nothing. All results are reproducible with the open-source pgvector-bench CLI.
One representative operating point per tier. Each card shows the recommend recall-0.93 setting where available, or the largest-dataset row for tiers where a bigger index was measured. The p50/p95/p99 triple and the recall@k are always included — they are what the benchmark actually measures.
Solo · 2 vCPU / 4 GB
$15/mo · 250k × 1536
~999
QPS · 4 clients · ef_search=80
3.7 ms
p50
5.9 ms
p95
7.1 ms
p99
Growth · 4 vCPU / 8 GB
$59/mo · 500k × 1536
~1,797
QPS · 16 clients · ef_search=120
8.7 ms
p50
—
p95
—
p99
Scale · 8 vCPU / 16 GB
$99/mo · 1M × 1536
~2,504
QPS · 16 clients · ef_search=80
5.9 ms
p50
—
p95
—
p99
All measured (tier, dataset, ef_search, recall@10, concurrency) combinations. Cells shown as "—" were not captured for that configuration. The Solo row at ef_search=200 / recall 0.99 is the known collapse point where the working set spills past the 4 GB cache and latency surges to 43 ms p50.
| Tier | Vectors | ef_search | recall@10 | clients | QPS | p50 | p95 | p99 |
|---|---|---|---|---|---|---|---|---|
| Solo | 250k × 1536 | 10 | 0.75 | 4 | 1,300 | 2.9 ms | 4.3 ms | 5.2 ms |
| Solo | 250k × 1536 | 10 | 0.75 | 16 | 1,980 | 8.0 ms | 11.2 ms | — |
| Solo | 250k × 1536 | 80 | 0.93 | 4 | 999 | 3.7 ms | 5.9 ms | 7.1 ms |
| Solo | 250k × 1536 | 80 | 0.93 | 16 | 1,568 | 10.0 ms | — | — |
| Solo | 250k × 1536 | 200 | 0.99 | 16 | 299 cache spill — disk-bound collapse | 43 ms | — | — |
| Growth | 250k × 1536 | 120 | 0.93 | 8 | 1,280 | 5.6 ms | — | — |
| Growth | 250k × 1536 | 120 | 0.93 | 16 | 1,716 | 8.5 ms | — | — |
| Growth | 500k × 1536 | 120 | 0.88 | 8 | 1,380 | 5.6 ms | — | — |
| Growth | 500k × 1536 | 120 | 0.88 | 16 | 1,797 | 8.7 ms | — | — |
| Scale | 250k × 1536 | 120 | 0.93 | 8 | 1,458 | 5.4 ms | — | — |
| Scale | 250k × 1536 | 120 | 0.93 | 16 | 2,468 | 6.3 ms | — | — |
| Scale | 1M × 1536 | 80 | 0.75 | 16 | 2,504 | 5.9 ms | — | — |
| Scale | 1M × 1536 | 200 | 0.85 | 16 | 1,953 | 8.0 ms | — | — |
HNSW index builds are memory-bound. Each tier has a hard size limit beyond which the build takes hours or fails entirely. The throughput numbers above are irrelevant if the index will not build. Size the node by index build memory first, QPS second.
| Tier (RAM) | Builds fine | Builds, but slow / unusable | Will not build |
|---|---|---|---|
| Solo — 4 GB | 250k (~11 min) | 500k (~4.2 h, memory-starved) | 1M (OOM) |
| Growth — 8 GB | 250k (~90 s), 500k (~40 min) | — | 1M (OOM) |
| Scale — 16 GB | 250k (~60 s), 1M (~30 min) | — | 2M (OOM) |
Exactly what was measured and how — for anyone who wants to understand or reproduce it.
The benchmark client ran on a separate VM in the same region (Hetzner EU, Falkenstein), connecting over the public endpoint with TLS (sslmode=require). That is exactly the path a same-region application takes. On Solo the client connects through PgBouncer (port 6432, transaction pooling); on Growth and Scale through the load balancer (port 5432). Bare SELECT 1 round-trip overhead from the same region is about 0.4 ms on Solo and 0.9 ms on Growth/Scale. Cross-region latency would add roughly 90 ms per query and would dominate all these numbers.
Vectors are synthetic but clustered: a 1536-dimensional Gaussian mixture, normalized for cosine distance. A clustered mixture is used rather than uniform random noise so that recall is representative of real embedding structure — real embeddings may recall slightly higher. The index is HNSW with m=16 and ef_construction=64 (the pgvector defaults for 1536-dim). Recall@10 is measured against exact brute-force KNN ground truth computed with index scans disabled.
Approximate nearest-neighbor search trades recall for speed. The hnsw.ef_search parameter controls the tradeoff. A QPS number without a recall number and a concurrency count is an advertisement, not a measurement. Every result on this page is a (tier, dataset, ef_search, recall@10, concurrency, QPS, latency) tuple. A single headline number without those qualifiers tells you nothing about what the system actually delivers for your workload.
All numbers were produced with the open-source pgvector-bench CLI, released under the MIT license and available at github.com/Rivestack/pgvector-bench. A single Go binary with an interactive wizard and a self-contained HTML report. Run it against any PostgreSQL instance — including a Rivestack free tier — to measure your own p50/p95/p99, QPS curve, and recall@k against brute-force ground truth on your actual data.
Related: NVMe vs cloud SSD benchmarks (full blog post) — the source of every number on this page, including the latency curves, the ef_search=200 collapse, and per-tier analysis. Run the benchmarks yourself with the pgvector-bench CLI. Try managed pgvector against a free tier to measure your own index.