Question 1

What QPS does managed pgvector deliver?

Accepted Answer

On a Solo node (2 vCPU / 4 GB, $15/mo), pgvector HNSW search delivers approximately 999 QPS at recall@10 0.93 with 4 concurrent clients (p50 3.7 ms, p95 5.9 ms, p99 7.1 ms), scaling to approximately 1,568 QPS at 16 clients (p50 10 ms). On a Scale node (8 vCPU / 16 GB, $99/mo) at the same recall, throughput reaches approximately 2,468 QPS at 16 clients (p50 6.3 ms). Every figure is reproducible with the open-source pgvector-bench CLI.

Question 2

How does recall@k affect pgvector throughput?

Accepted Answer

Recall and throughput are a direct tradeoff controlled by the hnsw.ef_search setting. On a Solo node with 250k x 1536-dimension vectors, recall@10 0.75 (ef_search=10) delivers about 1,300 QPS at 4 clients (p50 2.9 ms), while recall@10 0.93 (ef_search=80) delivers about 999 QPS (p50 3.7 ms). At recall@10 0.99 (ef_search=200), throughput collapses to 299 QPS and p50 climbs to 43 ms because the working set spills past the 4 GB cache. Always set ef_search to match your recall target, not the highest possible value.

Question 3

How many vectors can each pricing tier serve hot?

Accepted Answer

HNSW index builds are memory-bound, setting a hard ceiling per tier. A Solo node (4 GB) builds and serves a 250k x 1536 index hot in about 11 minutes. A Growth node (8 GB) extends that to 500k vectors (about 40 minutes to build). Only Scale (16 GB) can build and serve a 1M x 1536 index, building in about 30 minutes and holding the full index in its 12 GB cache. A 1M x 1536 index fails with an out-of-memory error on Growth and Solo. Size your tier by index build memory first, QPS target second.

Question 4

Can I reproduce these benchmarks myself?

Accepted Answer

Yes. All results were produced with the open-source pgvector-bench CLI, available at github.com/Rivestack/pgvector-bench under the MIT license. Run it against any PostgreSQL instance to measure p50/p95/p99 latency, QPS at each concurrency level, and recall@k against brute-force ground truth. The binary is a single Go executable with no external dependencies. Every number on this page was measured from a same-region client over TLS.

Tier	Vectors	ef_search	recall@10	clients	QPS	p50	p95	p99
Solo	250k × 1536	10	0.75	4	1,300	2.9 ms	4.3 ms	5.2 ms
Solo	250k × 1536	10	0.75	16	1,980	8.0 ms	11.2 ms	—
Solo	250k × 1536	80	0.93	4	999	3.7 ms	5.9 ms	7.1 ms
Solo	250k × 1536	80	0.93	16	1,568	10.0 ms	—	—
Solo	250k × 1536	200	0.99	16	299 cache spill — disk-bound collapse	43 ms	—	—
Growth	250k × 1536	120	0.93	8	1,280	5.6 ms	—	—
Growth	250k × 1536	120	0.93	16	1,716	8.5 ms	—	—
Growth	500k × 1536	120	0.88	8	1,380	5.6 ms	—	—
Growth	500k × 1536	120	0.88	16	1,797	8.7 ms	—	—
Scale	250k × 1536	120	0.93	8	1,458	5.4 ms	—	—
Scale	250k × 1536	120	0.93	16	2,468	6.3 ms	—	—
Scale	1M × 1536	80	0.75	16	2,504	5.9 ms	—	—
Scale	1M × 1536	200	0.85	16	1,953	8.0 ms	—	—

Tier (RAM)	Builds fine	Builds, but slow / unusable	Will not build
Solo — 4 GB	250k (~11 min)	500k (~4.2 h, memory-starved)	1M (OOM)
Growth — 8 GB	250k (~90 s), 500k (~40 min)	—	1M (OOM)
Scale — 16 GB	250k (~60 s), 1M (~30 min)	—	2M (OOM)

Managed pgvector benchmarks.
QPS with recall. Not marketing.

Key numbers

Full results table

Build ceiling

Methodology

Client and network path

Dataset and index

Why recall and concurrency are non-negotiable

Reproducibility

Frequently asked questions

Stop overpaying for
pgvector you don't control.

Managed pgvector benchmarks.QPS with recall. Not marketing.