Pinecone vs. Weaviate vs. pgvector?

pgvector if you have Postgres and < 1M vectors. Pinecone if you want zero ops. Weaviate/Qdrant for hybrid search and on-prem. Cost gap within 20% at most scales.

Does dimension matter for cost?

Yes — storage scales linearly with dimension. 3072-dim vectors cost 2× what 1536-dim vectors do to store. Matryoshka embeddings let you truncate safely.

What about index type?

HNSW is faster but uses 2–3× more memory than IVF. Managed providers hide this from you; self-hosted users must tune it.

For production, yes — 2–3 replicas for availability. Most managed DBs charge per replica, doubling or tripling cost.

When do I outgrow vector DBs?

At 100M+ vectors with heavy filtering, specialized stacks (Milvus, Vespa) beat general-purpose vector DBs on $/query.

Vector Database Cost Calculator

Vector database economics in 2026

Three years into the RAG era, vector database pricing has converged into three archetypes with meaningfully different economics: serverless managed (Pinecone Serverless, Turbopuffer), provisioned managed (Weaviate Cloud, Qdrant Cloud), and roll-your-own on Postgres + pgvector or Elastic. Picking the wrong one at your scale is an order-of-magnitude cost mistake.

Why most teams default to the wrong tier

The most common mistake we see is picking a managed vector DB at the prototype stage, setting up the account and paying $70/month for an index with 40k vectors that could live on pgvector for free. The second most common is the opposite: running pgvector past the point of sanity with 50M vectors, fighting reindex-time bloat, and burning engineering hours that would have paid for Pinecone twice over. Both failure modes come from picking once and never revisiting. Vector DB choice deserves a quarterly review for the first year of production life.

Option	Sweet spot	Indicative cost (10M vectors, 1M queries/mo)	Operational burden
Pinecone Serverless	0–50M vectors, spiky traffic	~$80/mo	Lowest
Turbopuffer	Very cheap cold storage + OK latency	~$30/mo	Low
Weaviate Cloud (Standard)	50M–500M vectors, steady traffic	~$250/mo	Low
Qdrant Cloud	Same band, good filtering	~$200/mo	Low-medium
pgvector on RDS	Already on Postgres, <10M vectors	~$120/mo (db.r6g.large)	Medium — you are the DBA
Self-hosted (Qdrant/Milvus on K8s)	>100M vectors, steady traffic, team with ops	~$600/mo infra, many eng hours	High

Why the line item grows faster than you think

Vector DB cost scales on three axes simultaneously: vector count (corpus growth), query volume (user traffic), and metadata complexity (feature additions like filtering by user, date, tenant, or label). Each axis alone is manageable; stacked, they compound. A deployment that was $200/month at launch routinely crosses $1,500/month in the first 12 months without a change in the architecture — just from content and traffic growth. Budgeting vector DB cost linearly against corpus size will underestimate by 2–3× inside a year.

The four cost drivers nobody budgets for

Dimensionality. Memory and storage scale linearly with vector dimensions. Using text-embedding-3-large (3072 dim) instead of text-embedding-3-small (1536 dim) doubles your vector DB bill for ~3pp recall improvement. Almost never worth it at scale.
Metadata index complexity. Filtered queries (tenant_id = X AND created_at > Y) can be 5–20× more expensive than pure nearest-neighbor. If every query has 3+ metadata filters, budget 3× the headline cost.
Reindex cycles.Pinecone Serverless charges for write units. Re-embedding a 10M-doc corpus is routinely >$100/run. Incremental updates only, or you will pay.
Multi-tenancy. One index per customer (SaaS pattern) vs. one shared index with metadata filtering has ~10× different cost curves. Plan the architecture up front.

Benchmarking against the actual workload

Public benchmarks on the vector DB side (ANN-Benchmarks, BEIR, MTEB) do not reflect real production workloads well. The gap between synthetic benchmarks and live traffic is filter complexity, metadata cardinality, and tail-latency behavior under mixed read-write load. Before committing to a specific provider, stress-test each candidate with a representative query mix from your actual application — ideally replaying production queries from a prior system. An afternoon of load testing saves a migration six months later.

Hidden tax: egress

If your vector DB is in AWS us-east and your app is in GCP europe-west, every query pays cross-cloud egress. A 1KB payload × 200k queries/day is < $10/month, but many setups stream much more (full chunks back to the LLM), and it can reach $500/month invisibly. Check the networking diagram before you pick a region.

Read-write ratio matters for tier selection

Workloads that are mostly read (support chatbot retrieval, internal search) pay mostly for query throughput and latency. Workloads that are write-heavy (active corpus with daily churn) pay for index maintenance and reindex cycles. Most managed vector DBs price asymmetrically on reads vs writes, and the ratio can swing provider choice significantly. Pinecone Serverless is priced for read-dominant workloads; Qdrant and self-hosted options are more forgiving on high-write scenarios.

The often-correct answer: start on pgvector

For anything under 5M vectors and < 50 queries/sec, Postgres + pgvector + HNSW index is good enough, costs whatever your existing Postgres costs (often near zero on top), and saves you onboarding a new vendor. Migrate to a dedicated vector DB when you hit one of: vector count > 10M, P95 latency requirement < 50ms, or metadata filtering performance degrades.

Migration planning is its own budget line

Every vector DB migration we have watched has taken longer and cost more than planned. The typical path: six months on pgvector, migration to Pinecone when query volume spikes, two to four weeks of dual-write and comparison, and a re-embed step in the middle because the new store wants a specific format. Budget 3–6 engineer-weeks for any serious migration plus the recurring cost of the new infrastructure during overlap. This is why picking the right tier early, even at a slight overspend, often wins versus migrating later.

Three real workload sizings

Abstracted pricing obscures how different these options are at actual volume. Three real deployments with specific vector counts, QPS, and per-provider cost:

Early-stage B2B SaaS, 800k vectors, 200k queries/month: pgvector on an existing db.r6g.large at $0 marginal (Postgres was already running). Pinecone Serverless would be ~$40/mo. Weaviate Standard would be ~$180/mo. Pgvector wins by default until you hit operational pain.
Mid-stage consumer AI, 40M vectors, 8M queries/month: Pinecone Serverless at ~$210/mo (reads dominate). Turbopuffer at ~$120/mo with slightly higher P95 latency. Qdrant Cloud at ~$290/mo with better filtering. Pgvector starts to feel painful here on reindex cycles.
Enterprise RAG, 300M vectors, 30M queries/month:self-hosted Qdrant cluster on K8s = ~$1,800/mo infra + significant eng time, or Weaviate Cloud Enterprise at ~$4,500/mo. At this scale, the "build vs buy" answer depends entirely on whether you already have a platform team.

Latency vs recall: HNSW tuning decisions

Every production vector DB uses HNSW (or a variant) for approximate nearest neighbor. HNSW has two knobs that matter: M (graph connectivity) and ef_search(query-time search depth). Raising ef_search from 32 → 128 typically increases recall@10 by 2–4pp at the cost of 2–3× query latency. For an agentic backend where you can tolerate 100ms per call, crank it up. For a synchronous chat UX where every ms matters, tune down.

Filtering is the hidden cost

Pure vector search is fast. Add WHERE tenant_id = X AND created_at > Y and the performance characteristics change dramatically. Pinecone and Weaviate use pre-filtering and post-filtering depending on selectivity; bad filter configs can make queries 20× slower. Specific patterns to avoid: high-cardinality filters without index, range filters on float fields, OR-filters across 3+ conditions. Test your filter patterns at real scale before committing.

Multi-tenant architecture decisions

The three common patterns for a multi-tenant SaaS:

Shared index, metadata filter per tenant. Cheapest to run, hardest to isolate, risk of cross-tenant data leaks via bugs. Good at small scale.
Namespace per tenant in the same index. Pinecone and some others support this natively. Middle ground on cost and isolation.
Dedicated index per tenant. Strongest isolation, 10× the cost at scale, operational headache of managing hundreds of indexes. Required for some compliance contexts (healthcare, defense).

Start shared, migrate to namespace when tenant count exceeds 100 or when any enterprise customer asks. Dedicated only when compelled.

Production patterns worth copying

Dual-write during migration. When moving between vector DBs, write to both for 2–4 weeks before cutover. Compare recall between stores on a gold set daily.
Shadow query on a fraction of traffic. Route 5% of live queries to the candidate vector DB and log results without serving. Compare quality before flipping traffic.
Rebuild index offline.HNSW parameters can't change after index creation; if you need to retune, build a fresh index in parallel and swap.
Cache query embeddings. Identical user queries should hit a Redis cache, not re-embed. Typical cache hit rate for consumer products: 20–40%.

Frequently asked questions

Is Turbopuffer really viable? Yes, for read-heavy, latency-tolerant workloads. They store on object storage with a query-time cache, so cold-start latency is higher (200–500ms) but cost-per-vector is 3–5× lower than Pinecone. Good for archives.

What about Qdrant vs Weaviate? Qdrant has a cleaner filtering story and better Rust performance. Weaviate has better multimodal support and a more mature cloud offering. For a simple RAG workload, either works; for complex filtering, Qdrant.

Is pgvector really production-grade? Yes, up to ~10M vectors and 50 QPS. Beyond that, index build time and memory pressure on Postgres get painful. The ceiling is real but higher than many assume.

Do I need a dedicated vector DB for a RAG chatbot?Almost never, unless you're over 10M vectors. Most chatbot RAG deployments work fine on pgvector.

How do I benchmark vector DB quality? Build a gold set of 200 query/relevant-doc pairs. Run each candidate DB with the same embeddings. Measure recall@10 and P50/P95 latency.

What about hybrid search (vector + BM25)? Underrated. Weaviate, Qdrant, Elasticsearch, and OpenSearch all support it. Typical recall@10 improvement of 5–12pp over pure vector on messy real-world text. Worth the complexity.

Does metadata schema affect cost? On Pinecone, yes — stored metadata bytes count toward storage cost. Keep metadata minimal and point to a separate document store for the full payload.

What is the cheapest path at 500M+ vectors? Self-hosted Milvus or Qdrant on commodity hardware. Managed services hit ~$6–10k/mo at this scale where the same workload on 3–4 beefy VMs is ~$2k/mo with full-time platform engineering.

Keep going

RAG pipeline cost — full per-query cost including LLM + embeddings.
Embedding cost — the other half of your RAG cost stack.
GPU inference cost — relevant if you self-host vector DB + LLM together.
AI SaaS pricing — work backward from infra cost to a margin-safe price.

Use the data programmatically

Every calculator on this site is also exposed as a free, CORS-open JSON endpoint. No auth, no rate limit (fair-use, please cache). License is CC-BY-4.0 — link back to attribution.canonicalUrl in the response.

Endpoint: https://aieconomyhub.co/api/page/vector-db-cost

curl

curl -s 'https://aieconomyhub.co/api/page/vector-db-cost' | jq .

Python

import requests

r = requests.get("https://aieconomyhub.co/api/page/vector-db-cost", timeout=10)
r.raise_for_status()
data = r.json()
print(data["title"])
for faq in data.get("faqs", []):
    print("Q:", faq["q"])

JavaScript / Node

// Node 20+ / modern browser
const res = await fetch("https://aieconomyhub.co/api/page/vector-db-cost");
if (!res.ok) throw new Error("HTTP " + res.status);
const vector_db_cost = await res.json();
console.log(vector_db_cost.title);
for (const faq of vector_db_cost.faqs ?? []) {
  console.log("Q:", faq.q);
}

Spec: /api/openapi.yaml · Docs: /api/docs

Vector database cost

Results

Visualization

Frequently asked questions

Vector database economics in 2026

Why most teams default to the wrong tier

Why the line item grows faster than you think

The four cost drivers nobody budgets for

Benchmarking against the actual workload

Hidden tax: egress

Read-write ratio matters for tier selection

The often-correct answer: start on pgvector

Migration planning is its own budget line

Three real workload sizings

Latency vs recall: HNSW tuning decisions

Filtering is the hidden cost

Multi-tenant architecture decisions

Production patterns worth copying

Frequently asked questions

Use the data programmatically

Track your AI tool costs, ROI, and productivity metrics

More free tools