Everyone talks about how cheap AI agents are compared to employees. And they're right — on a per-task basis, the math is absurd. But "cheap per task" and "cheap to run" are different claims. Here's what it actually costs to operate an agent-first company, based on our real numbers.
The line items nobody mentions
When people calculate agent costs, they count API tokens. That's like calculating the cost of an employee by looking at their salary and ignoring taxes, benefits, equipment, office space, and management overhead. The token cost is real, but it's often less than half the total.
Here's what actually goes on the bill:
- API calls. The obvious one. Language model inference, embedding generation, image processing. This scales with volume and model choice. Smaller models for routine work, larger ones for complex reasoning. We spend roughly €300-600 per month across all agent workloads.
- Infrastructure. Agents need somewhere to run. Serverless functions for simple tasks, persistent processes for long-running workflows. Database reads and writes for every agent action. Vector storage for retrieval. Logging and monitoring. Another €100-200 per month.
- Failure and retry costs. Agents fail. API calls time out. Models hallucinate. Rate limits hit. Every failure triggers a retry, and retries cost tokens. On a bad day, 15-20% of our API spend goes to retries and error recovery. Budget for this.
- Human review time. This is the biggest hidden cost. Every piece of agent output needs review before it affects the real world. Code gets reviewed before merge. Content gets edited before publish. Financial entries get verified before booking. The human time for review and correction is real labor — and it's the most expensive line item.
Our actual monthly bill
For context: we're a small company running agents across engineering, content, operations, and finance. Here's what a typical month looks like.
- Language model APIs (OpenAI, Anthropic): €400-700
- Hosting and infrastructure (Vercel, Supabase): €50-150
- Tooling and services (monitoring, email, payments): €50-100
- Human time directing and reviewing agents: 15-20 hours per week
Total direct cost: roughly €500-950 per month in services. The human time is the multiplier. If you value that time at market rates, the fully-loaded cost is higher. But the comparison isn't agent cost vs. zero — it's agent cost vs. hiring the 3-5 people you'd need to produce the same output without agents.
At equivalent output, we estimate agents save us €15,000-25,000 per month compared to a traditional team. That's not a theoretical calculation — it's based on the specific roles we'd need to hire if agents didn't exist.
Where costs surprise you
Context windows are expensive
The longer the conversation, the more tokens per call. Agents working on complex tasks build up context quickly. A 20-message conversation with a large model can cost €0.10-0.50 per exchange. Multiply by hundreds of agent interactions per day and you feel it. The fix is aggressive context management — summarize, truncate, and use smaller models for subtasks.
Iteration loops compound
An agent that gets it right on the first try costs X. An agent that needs three rounds of self-correction costs 3-4X because each round includes the full context. Some tasks have inherently high iteration rates — complex code generation, nuanced copywriting, anything requiring aesthetic judgment. Budget for 2-3X the naive estimate on these tasks.
Observability costs grow fast
If you're serious about running agents in production, you need to log everything. Every prompt, every response, every tool call, every decision. That's a lot of data. Structured logging, retention policies, and query infrastructure add up. We learned this the hard way after an agent made a series of incorrect financial entries and we had no way to trace the reasoning chain.
The costs that go down over time
Not everything trends upward. Several cost lines decrease as you mature.
Prompt engineering amortizes. The upfront cost of crafting good prompts, building evaluation sets, and tuning agent behavior is significant. But once a prompt works, it works for months. The per-invocation cost of a well-tuned prompt is nearly zero.
Model costs drop predictably. The same capability that cost $0.03 per 1K tokens a year ago costs $0.005 today. This trend continues. Tasks that are marginally economic now will be obviously economic in 12 months.
Human review gets faster. As you build trust in specific agents for specific tasks, the review process becomes lighter. What starts as a line-by-line code review becomes a scan-and-approve. What starts as a full content rewrite becomes a light edit. The review cost decreases as the agent proves itself — but it never reaches zero.
The honest comparison
Is an agent-first company cheaper than a traditional one? Yes, significantly — if you measure cost per unit of output. A blog post costs €2-5 in API calls instead of €500 for a freelance writer. A routine code change costs €0.50 instead of an hour of developer time. A financial reconciliation costs €0.10 instead of a bookkeeper's afternoon.
But the total cost isn't close to zero. You're trading headcount cost for infrastructure cost, management cost for direction cost, and hiring time for prompt engineering time. The net savings are real — we estimate 70-80% reduction in total cost for equivalent output — but "agents are free" is a myth that leads to bad planning.
What to budget if you're starting
If you're considering an agent-first approach, here's a realistic budget for the first three months:
- Month 1: €200-400 in API costs while you experiment and waste tokens on bad prompts. Expect 50% of agent output to be unusable. Budget 10 hours per week of human time on setup and evaluation.
- Month 2: €400-700 as agents take on real workloads. Failure rate drops to 20-30%. Human review time is still high but decreasing. This is the month where you discover your specific failure modes.
- Month 3: €500-900 at steady state. Agents are producing reliable output in their trained domains. Human time shifts from fixing agent mistakes to expanding agent capabilities. The ROI becomes obvious.
The biggest risk isn't overspending on APIs. It's underinvesting in the human layer — the review processes, the escalation rules, the observability infrastructure — that makes agents production-ready instead of demo-ready. Budget for that, and the API costs take care of themselves.