Vector database economics in 2026
Three years into the RAG era, vector database pricing has converged into three archetypes with meaningfully different economics: serverless managed (Pinecone Serverless, Turbopuffer), provisioned managed (Weaviate Cloud, Qdrant Cloud), and roll-your-own on Postgres + pgvector or Elastic. Picking the wrong one at your scale is an order-of-magnitude cost mistake.
Why most teams default to the wrong tier
The most common mistake we see is picking a managed vector DB at the prototype stage, setting up the account and paying $70/month for an index with 40k vectors that could live on pgvector for free. The second most common is the opposite: running pgvector past the point of sanity with 50M vectors, fighting reindex-time bloat, and burning engineering hours that would have paid for Pinecone twice over. Both failure modes come from picking once and never revisiting. Vector DB choice deserves a quarterly review for the first year of production life.
| Option | Sweet spot | Indicative cost (10M vectors, 1M queries/mo) | Operational burden |
|---|---|---|---|
| Pinecone Serverless | 0–50M vectors, spiky traffic | ~$80/mo | Lowest |
| Turbopuffer | Very cheap cold storage + OK latency | ~$30/mo | Low |
| Weaviate Cloud (Standard) | 50M–500M vectors, steady traffic | ~$250/mo | Low |
| Qdrant Cloud | Same band, good filtering | ~$200/mo | Low-medium |
| pgvector on RDS | Already on Postgres, <10M vectors | ~$120/mo (db.r6g.large) | Medium — you are the DBA |
| Self-hosted (Qdrant/Milvus on K8s) | >100M vectors, steady traffic, team with ops | ~$600/mo infra, many eng hours | High |
Why the line item grows faster than you think
Vector DB cost scales on three axes simultaneously: vector count (corpus growth), query volume (user traffic), and metadata complexity (feature additions like filtering by user, date, tenant, or label). Each axis alone is manageable; stacked, they compound. A deployment that was $200/month at launch routinely crosses $1,500/month in the first 12 months without a change in the architecture — just from content and traffic growth. Budgeting vector DB cost linearly against corpus size will underestimate by 2–3× inside a year.
The four cost drivers nobody budgets for
- Dimensionality. Memory and storage scale linearly with vector dimensions. Using text-embedding-3-large (3072 dim) instead of text-embedding-3-small (1536 dim) doubles your vector DB bill for ~3pp recall improvement. Almost never worth it at scale.
- Metadata index complexity. Filtered queries (
tenant_id = X AND created_at > Y) can be 5–20× more expensive than pure nearest-neighbor. If every query has 3+ metadata filters, budget 3× the headline cost. - Reindex cycles.Pinecone Serverless charges for write units. Re-embedding a 10M-doc corpus is routinely >$100/run. Incremental updates only, or you will pay.
- Multi-tenancy. One index per customer (SaaS pattern) vs. one shared index with metadata filtering has ~10× different cost curves. Plan the architecture up front.
Benchmarking against the actual workload
Public benchmarks on the vector DB side (ANN-Benchmarks, BEIR, MTEB) do not reflect real production workloads well. The gap between synthetic benchmarks and live traffic is filter complexity, metadata cardinality, and tail-latency behavior under mixed read-write load. Before committing to a specific provider, stress-test each candidate with a representative query mix from your actual application — ideally replaying production queries from a prior system. An afternoon of load testing saves a migration six months later.
Hidden tax: egress
If your vector DB is in AWS us-east and your app is in GCP europe-west, every query pays cross-cloud egress. A 1KB payload × 200k queries/day is < $10/month, but many setups stream much more (full chunks back to the LLM), and it can reach $500/month invisibly. Check the networking diagram before you pick a region.
Read-write ratio matters for tier selection
Workloads that are mostly read (support chatbot retrieval, internal search) pay mostly for query throughput and latency. Workloads that are write-heavy (active corpus with daily churn) pay for index maintenance and reindex cycles. Most managed vector DBs price asymmetrically on reads vs writes, and the ratio can swing provider choice significantly. Pinecone Serverless is priced for read-dominant workloads; Qdrant and self-hosted options are more forgiving on high-write scenarios.
The often-correct answer: start on pgvector
For anything under 5M vectors and < 50 queries/sec, Postgres + pgvector + HNSW index is good enough, costs whatever your existing Postgres costs (often near zero on top), and saves you onboarding a new vendor. Migrate to a dedicated vector DB when you hit one of: vector count > 10M, P95 latency requirement < 50ms, or metadata filtering performance degrades.
Migration planning is its own budget line
Every vector DB migration we have watched has taken longer and cost more than planned. The typical path: six months on pgvector, migration to Pinecone when query volume spikes, two to four weeks of dual-write and comparison, and a re-embed step in the middle because the new store wants a specific format. Budget 3–6 engineer-weeks for any serious migration plus the recurring cost of the new infrastructure during overlap. This is why picking the right tier early, even at a slight overspend, often wins versus migrating later.
Three real workload sizings
Abstracted pricing obscures how different these options are at actual volume. Three real deployments with specific vector counts, QPS, and per-provider cost:
- Early-stage B2B SaaS, 800k vectors, 200k queries/month: pgvector on an existing db.r6g.large at $0 marginal (Postgres was already running). Pinecone Serverless would be ~$40/mo. Weaviate Standard would be ~$180/mo. Pgvector wins by default until you hit operational pain.
- Mid-stage consumer AI, 40M vectors, 8M queries/month: Pinecone Serverless at ~$210/mo (reads dominate). Turbopuffer at ~$120/mo with slightly higher P95 latency. Qdrant Cloud at ~$290/mo with better filtering. Pgvector starts to feel painful here on reindex cycles.
- Enterprise RAG, 300M vectors, 30M queries/month:self-hosted Qdrant cluster on K8s = ~$1,800/mo infra + significant eng time, or Weaviate Cloud Enterprise at ~$4,500/mo. At this scale, the "build vs buy" answer depends entirely on whether you already have a platform team.
Latency vs recall: HNSW tuning decisions
Every production vector DB uses HNSW (or a variant) for approximate nearest neighbor. HNSW has two knobs that matter: M (graph connectivity) and ef_search(query-time search depth). Raising ef_search from 32 → 128 typically increases recall@10 by 2–4pp at the cost of 2–3× query latency. For an agentic backend where you can tolerate 100ms per call, crank it up. For a synchronous chat UX where every ms matters, tune down.
Filtering is the hidden cost
Pure vector search is fast. Add WHERE tenant_id = X AND created_at > Y and the performance characteristics change dramatically. Pinecone and Weaviate use pre-filtering and post-filtering depending on selectivity; bad filter configs can make queries 20× slower. Specific patterns to avoid: high-cardinality filters without index, range filters on float fields, OR-filters across 3+ conditions. Test your filter patterns at real scale before committing.
Multi-tenant architecture decisions
The three common patterns for a multi-tenant SaaS:
- Shared index, metadata filter per tenant. Cheapest to run, hardest to isolate, risk of cross-tenant data leaks via bugs. Good at small scale.
- Namespace per tenant in the same index. Pinecone and some others support this natively. Middle ground on cost and isolation.
- Dedicated index per tenant. Strongest isolation, 10× the cost at scale, operational headache of managing hundreds of indexes. Required for some compliance contexts (healthcare, defense).
Start shared, migrate to namespace when tenant count exceeds 100 or when any enterprise customer asks. Dedicated only when compelled.
Production patterns worth copying
- Dual-write during migration. When moving between vector DBs, write to both for 2–4 weeks before cutover. Compare recall between stores on a gold set daily.
- Shadow query on a fraction of traffic. Route 5% of live queries to the candidate vector DB and log results without serving. Compare quality before flipping traffic.
- Rebuild index offline.HNSW parameters can't change after index creation; if you need to retune, build a fresh index in parallel and swap.
- Cache query embeddings. Identical user queries should hit a Redis cache, not re-embed. Typical cache hit rate for consumer products: 20–40%.
Frequently asked questions
Is Turbopuffer really viable? Yes, for read-heavy, latency-tolerant workloads. They store on object storage with a query-time cache, so cold-start latency is higher (200–500ms) but cost-per-vector is 3–5× lower than Pinecone. Good for archives.
What about Qdrant vs Weaviate? Qdrant has a cleaner filtering story and better Rust performance. Weaviate has better multimodal support and a more mature cloud offering. For a simple RAG workload, either works; for complex filtering, Qdrant.
Is pgvector really production-grade? Yes, up to ~10M vectors and 50 QPS. Beyond that, index build time and memory pressure on Postgres get painful. The ceiling is real but higher than many assume.
Do I need a dedicated vector DB for a RAG chatbot?Almost never, unless you're over 10M vectors. Most chatbot RAG deployments work fine on pgvector.
How do I benchmark vector DB quality? Build a gold set of 200 query/relevant-doc pairs. Run each candidate DB with the same embeddings. Measure recall@10 and P50/P95 latency.
What about hybrid search (vector + BM25)? Underrated. Weaviate, Qdrant, Elasticsearch, and OpenSearch all support it. Typical recall@10 improvement of 5–12pp over pure vector on messy real-world text. Worth the complexity.
Does metadata schema affect cost? On Pinecone, yes — stored metadata bytes count toward storage cost. Keep metadata minimal and point to a separate document store for the full payload.
What is the cheapest path at 500M+ vectors? Self-hosted Milvus or Qdrant on commodity hardware. Managed services hit ~$6–10k/mo at this scale where the same workload on 3–4 beefy VMs is ~$2k/mo with full-time platform engineering.
- RAG pipeline cost — full per-query cost including LLM + embeddings.
- Embedding cost — the other half of your RAG cost stack.
- GPU inference cost — relevant if you self-host vector DB + LLM together.
- AI SaaS pricing — work backward from infra cost to a margin-safe price.