Build vs. buy for AI chatbots in 2026
The AI chatbot platform market (Intercom Fin, Zendesk AI Agents, Ada, Decagon, Cresta, Maven AGI) has matured to the point where out-of-the-box deflection for most B2C use cases beats what an ad-hoc LangChain build ships. The economics of custom builds have inverted compared to 2023: off-the-shelf is now usually cheaper and more capable for typical ticket-deflection work. Custom builds now win only in specific conditions.
Indicative pricing, April 2026
| Option | Pricing | Time to ship | Best for |
|---|---|---|---|
| Intercom Fin | $0.99/resolution + seats | 2-6 weeks | Existing Intercom customers |
| Zendesk AI Agents | Bundled w/ Zendesk + $0.50-1/resolution | 3-8 weeks | Zendesk stack |
| Ada | ~$2,500+/mo + volume | 4-10 weeks | Mid-large B2C |
| Decagon | Custom enterprise, $50-250k/yr | 6-12 weeks | Large enterprises with complex KBs |
| Maven AGI | Custom, usage-based | 4-10 weeks | Enterprise with advanced RAG needs |
| Custom LangChain + Sonnet 4.5 | $40-200k build + $2-20k/mo ops | 2-4 months | Specific IP, complex workflow |
| Custom no-code (Voiceflow, Botpress) | $50-500/mo + build time | 2-6 weeks | Very simple bots, SMB |
When to buy
- Your use case is standard ticket-deflection with a reasonable KB.
- You need to ship in under 2 months.
- You have <500k tickets/year — the resolution fees are manageable.
- You don't have engineering capacity for ongoing bot maintenance.
- Your helpdesk (Zendesk, Intercom, Front) has a tight-integrated AI option.
When to build
- Volume is high enough that resolution fees > $100k/year. At that point, ~$100k of build is easily justified.
- Your product has proprietary workflows (account actions, multi-step transactions, specialized tool use) that off-the-shelf can't execute.
- You're in a regulated vertical (financial services, healthcare) where data residency, audit, or model-selection control matters more than out-of-box convenience.
- The bot is a core product feature, not just an internal cost-saver.
Realistic custom build cost breakdown
- Initial build (8–16 weeks of senior eng × 1–2 engineers): $40k–$150k.
- Evals infrastructure (often missed): $15k–$50k.
- Integration with helpdesk + auth + analytics: $10k–$40k.
- Ongoing ops + tuning (0.5–1 FTE): $75k–$200k/year.
- LLM + vector DB runtime: $2k–$30k/mo at scale.
- Total year-1 TCO: $200k–$800k for a real production custom bot.
Hybrid: the quiet winner
A common architecture in 2026 is buying a platform (Intercom Fin, Ada) for the 70% of standard deflection use cases, and building custom integrations for the 30% involving proprietary actions. Gets you fast time-to-value plus IP control where it matters.
Three worked scenarios with real token math
Decoupling "build vs buy" from actual run-cost requires the token arithmetic of each deployment. Three representative workloads follow.
Scenario 1: B2C support bot, 250,000 tickets/month
Per request: 2,350 input + 280 output on Sonnet 4.5. Uncached: $2,812/mo. With Anthropic prompt caching on the 800-token system prefix (90% read discount, 73% hit): $1,657/mo. Route 65% of FAQ intents to Haiku 4 ($0.80/$4): $1,062/mo. Add Pinecone ($700/mo), Langfuse ($400/mo), and ops time (~$3k/mo of 0.25 FTE): ~$5.2k/mo at 250k ticket volume. Compare against Intercom Fin at $0.99/resolution × 137,500 resolved (55% deflection): $136k/mo. Custom build is $130k/mo cheaper — but it took 4 months and $120k in initial eng. Break-even vs Intercom: month 1 of operation.
Scenario 2: Internal RAG-based IT helpdesk bot, 50,000 queries/month
Per query: 7,220 input + 550 output = uncached $1,496/mo. With 92% cache hit on 3,200-tok system prompt: $1,108/mo. Add Cohere Rerank 3.5 ($50/mo), Pinecone ($700/mo), Langfuse ($400/mo): $2,258/mo all-in. A platform alternative (Moveworks, Aisera) would run $60-120k annually plus per-resolution fees. Custom wins on cost but requires a small eng ops team.
Scenario 3: Code-assistant bot for 10 devs × 40 queries/day
8,800 queries/mo × 5,600 input + 900 output on Sonnet 4.5 = $267/mo. Add 5% Opus escalations: $320/mo. A buy option (Cursor Business at $40/seat × 10) is $400/mo. Basically equivalent. Buy wins on operational simplicity every time for this scale.
Cost levers with math on run cost
- Anthropic prompt cache (90% read): 1,000-token system prompt at 200k QPM saves $540/mo per tenant ($600 → $60). Multiplied across a 30-tenant deployment, $16,200/mo of recaptured margin.
- OpenAI 50% automatic cache on ≥1,024-token matching prefix. Works without code changes.
- Gemini 75% context cache on long-context deployments. Good fit for multi-document RAG.
- Haiku 4 routing on 60-70% of simple intents: saves ~70% on the routed portion.
- Batch API (50% off) for eval and retraining runs, not for user-facing traffic.
Model selection rules for chatbots
- Haiku 4 for intent classification, FAQ lookups, PII scrubbing. The router tier.
- Sonnet 4.5 for natural-language synthesis over retrieved context. The workhorse.
- GPT-5 mini ($0.40/$1.60) for strict JSON tool calls and OpenAI-native pipelines.
- Opus 4.1 almost never in a chatbot. Wrong latency profile and 5× the cost for 2-3pp quality.
- Gemini 2.5 Flash for bulk summarization of transcripts, offline tagging, cheap enrichment pipelines around the bot.
Production patterns for custom chatbot builds
The 20% of work that kills "simple" custom bots is hardening. Before launch, you need: (1) a fallback chain (Sonnet 4.5 → GPT-5 → Haiku 4 + simplified prompt → static escalation to human); (2) circuit breakers per provider at 20% error rate over 2-minute windows; (3) retry budgets on every agent-like call (3 attempts, hard token ceiling); (4) per-tenant monthly token caps so a runaway customer cannot burn your margin; (5) PII scrubbing on both input and output with a deterministic redaction pipeline, not a model call; (6) an eval harness that runs nightly against 200 held-out tickets and alerts on regressions. Shipping a bot without these costs less than $50k; productionizing them costs $60-200k and 3-6 months. Budget accordingly.
Frequently asked questions
Is Intercom Fin actually worth $0.99/resolution? Under 50k resolutions/mo, yes — the build-cost crossover is not there. Above 100k resolutions/mo, custom becomes compelling.
How long does a custom chatbot really take? 2-4 months to demo, 6-10 months to production-hardened. The long tail is the expensive part.
Can I ship with LangChain in a weekend? You can demo. Production requires evals, retries, fallbacks, observability — all of which LangChain does not give you out of box.
What is a realistic deflection rate? 40-65% for mature B2C with a good KB. 25-45% for B2B with complex products. Higher numbers almost always mean the bot is escalating too aggressively.
Do I need a vector DB? Yes if you have more than 50 help-center articles. Pinecone Serverless, pgvector, or Weaviate Cloud all work. Budget $50-$700/mo depending on scale.
How much does a fine-tune help? Usually 3-8pp in deflection rate at $500- $5k cost. Worth it above ~30k resolutions/mo where margin matters.
What does ops actually look like post-launch? 0.25-0.5 FTE weekly: monitoring eval drift, reviewing escalations, updating the KB, tuning prompts. Skip this and quality decays measurably within 3 months.
When should I switch from buy to build? When platform resolution fees exceed $100k/year. At that point, $100k of build easily amortizes.
Does multi-tenant custom chatbot architecture share caches? Only if the system prompt is tenant-agnostic. Tenant-specific prefixes break cache and triple input cost. Architect for a shared-prefix + per-tenant-delta pattern.
What vision/multimodal inputs cost extra? Images on Anthropic are tile-billed at roughly 1,500 tokens per 1024×1024 image. Budget accordingly if your bot handles screenshots or receipts.
How much does a voice layer add? ElevenLabs Turbo at $0.10/1k chars for TTS plus Deepgram Nova-3 at $0.0043/min for STT adds roughly $0.04-0.08/ticket for a typical voice bot. Not trivial at volume.
Can I bring-your-own-key for customers on custom builds? Yes, and many enterprise buyers now require it. Adds 2-3 weeks of integration work and simplifies compliance review.
- Deflection savings — the ROI side.
- RAG pipeline cost — the architecture most bots use.
- AI SaaS pricing — pricing strategy if the bot is the product.
- LLM API cost — estimate ongoing bot costs.