Lovex
Back to blog
9 min read

AI token bill shock: what Uber and Microsoft just admitted

Token bill shock is what happens when an enterprise’s AI coding bill outruns the productivity it’s meant to fund — per-engineer monthly spend climbing into the four figures, annual budgets exhausted in a single quarter, and finance teams asking, with new urgency, what each token actually bought. In June 2026 the wave broke into the open: Microsoft is pulling Claude Code from its largest product division by June 30, and Uber’s CTO publicly admitted he’s “back to the drawing board.” Lova is the chat-first AI project management product where AI agents and humans claim work on a shared board, post evidence of completion, and leave a per-task record of who did what with which model — the surface where the next version of AI cost control has to live.

Per-token prices keep falling. Per-task bills keep climbing. The gap between those two sentences is the entire story, and it’s breaking budgets at two of the most sophisticated buyers of AI on the planet in the same month.

Key takeaways

  • Uber’s CTO Praveen Neppalli Naga told The Information his team had “maxed out” its 2026 AI budget by April — four months in. The quote, posted by reporter Anissa Gardizy on X: “I’m back to the drawing board, because the budget I thought I would need is blown away already.”
  • Microsoft is reportedly canceling most internal Claude Code licenses across its Experiences and Devices division — the teams behind Windows, Microsoft 365, Outlook, Teams, and Surface — by June 30, 2026, redirecting engineers to GitHub Copilot CLI. Per Windows Central, the decision is “likely driven by financial motives.”
  • Gartner’s May 19, 2026 forecast puts worldwide AI spending at $2.59 trillion in 2026, a 47% year-over-year jump — with AI infrastructure alone accounting for more than 45% of the total.
  • Per the FinOps Foundation’s State of FinOps 2026 report, 98% of respondents now actively manage AI costs — up from 63% in 2025 and 31% in 2024. AI cost management went from niche to near-universal in two years.
  • Andreessen Horowitz’s “LLMflation” analysis finds equivalent-performance LLM inference cost is dropping roughly 10x per year. Per-unit deflation isn’t the problem. Per-task volume is.

What is token bill shock, and why is it breaking out now?

Token bill shock is the gap between two truths that lived comfortably side by side until agents arrived: input tokens are cheap, and tasks use a lot of them. A single autonomous agent doesn’t answer once and stop. It reads files, drafts, runs tools, verifies, backtracks, retries, and posts back — one user-initiated “ship this fix” routinely translates into thousands of model calls before a pull request lands. Multiply that across an engineering organization that just discovered the productivity unlock, and the bill stops looking like a tool subscription. It starts looking like a salary.

That’s the threshold June 2026 just crossed in public. Until this month, “AI is expensive” was an abstraction in a deck. Now it’s a memo from Microsoft’s Experiences and Devices division and a Fortune interview where Uber’s COO struggles to link spend to outcome. The headline isn’t that Claude Code is too good. It’s that nobody, including the buyers, can yet tell which calls were worth making.

How did Uber burn its whole 2026 AI budget in four months?

The mechanics are almost embarrassingly simple. Uber rolled out Claude Code across its engineering organization in December 2025. Adoption went from roughly a third of the company’s 5,000-engineer org to the high 80s in a few months. Heavy users were spending $500 to $2,000 a month each on tokens. By April the entire annual AI tooling budget was gone. The company responded with a $1,500 monthly per-employee cap on agentic coding tools, applied to Claude Code and Cursor, as reported by ZeroHedge.

The interesting part is the executive response. Uber’s CTO didn’t say the tools weren’t working — he said the math was unrecognizable. “I’m back to the drawing board, because the budget I thought I would need is blown away already,” Praveen Neppalli Naga told The Information’s Anissa Gardizy in a quote that spread through engineering leadership X feeds in late May. Uber COO Andrew Macdonald put a finer point on it in his Fortune interview: the link between rising Claude Code usage and consumer-facing innovation “is not there yet.” A company that famously runs on operations metrics couldn’t draw the line from spend to outcome on its largest new line item.

That’s the part to sit with. Uber spent roughly $3.4 billion on R&D in its most recent year. The COO of a company at that scale is on the record saying the connection between agentic-AI spend and customer value hasn’t been demonstrated. The tools aren’t the problem; the missing instrumentation is.

Why is Microsoft pulling Claude Code in its biggest division?

Microsoft’s decision is the same story told from the opposite seat. Per Windows Central’s reporting, the Experiences and Devices division — which owns Windows, Microsoft 365, Outlook, Teams, and Surface, and represents thousands of engineers — will end internal Claude Code use by June 30, 2026, with engineers transitioning to GitHub Copilot CLI. The driver, as Windows Central puts it bluntly, is that “the token bill was a key driver, as high usage costs made the tool unsustainable.” The cutoff lines up neatly with the close of Microsoft’s fiscal year on June 30, suggesting a budget-cycle decision as much as a technology one.

The strategic read is harder than “Microsoft prefers its own tools.” Microsoft was an early enterprise customer of Claude Code precisely because engineers wanted it. Six months later the company is rolling it back — not because it didn’t work, but because at scale the unit economics didn’t survive contact with finance. Both Uber and Microsoft, in the same month, ran the same experiment and reached the same conclusion: when every developer can summon a frontier model, the question stops being “is this tool good” and becomes “which task is worth which model.”

The agentic unit economics test: did this task earn its tokens?

Here’s the framework worth taking away. Call it the agentic unit economics test: every task an AI agent does should be evaluable as a unit, with a value the team has agreed it’s worth, a model tier that matches that value, and an evidence artifact that proves it actually delivered. Run that test across every claimed task, and the bill stops being a mystery. The 2025 reflex was to point the most capable model at every problem and trust the per-token deflation curve to catch up. June 2026’s news cycle is the obituary for that reflex.

Three primitives have to exist for the test to mean anything. First, each task carries a priority and an explicit value or acceptance criterion before any agent touches it. Second, the choice of model is logged against the task — not a chat session, not a terminal window, but the unit of work itself. Third, “done” is something the next reader can verify: tests passed, file diff attached, dashboard updated, contract signed. That triple — intent, model, evidence — is what turns a token bill into a cost-per-outcome curve. Without it you’re where Uber’s COO is: paying a real invoice for a benefit you can’t draw a line to.

We argued in Structured data is the moat that AI tools are only as good as the data on the board, and in our coverage of Claude Fable 5 that intelligence routing has become a per-task decision. Token bill shock is what makes both arguments urgent. The team that can route this task to a cheap model, that task to a frontier model, and a third task to a human reviewer — while keeping a structured record of why each call was made — ships at the same velocity Uber discovered for a fraction of the bill. The team that can’t pays Microsoft’s lesson back with interest.

Where does Lova fit in?

Lova is built around exactly the surface this moment requires. Tasks carry priority, acceptance criteria, and structured fields as first-class data. Agents claim work through the same API humans use, and the claim records who took the task and when. Cards move only when the evidence required by the task is attached — a passing test, a posted artifact, a verified outcome. That gives a team a per-task ledger of model choice and result, the primitive AI FinOps has been trying to retrofit onto chat transcripts and IDE telemetry ever since the bills started coming in.

The bigger pattern: the 2025 AI tooling story was about giving every developer a frontier model on tap. The 2026 story is about deciding, deliberately and visibly, which calls were worth making. We wrote about the gap between AI adoption and measurable productivity in Why most companies see no ROI from AI agents — token bill shock is that gap with a price tag on it. The fix isn’t cheaper models; per-token deflation is already running 10x a year per a16z’s analysis. The fix is a shared board where every task carries the value, the model, and the evidence, and the bill at the end of the month is something the COO can explain.

Honest framing: it’s the second week of June. The Microsoft cutoff lands in three weeks; Uber’s response is two weeks old; the FinOps Foundation’s data points are first-quarter snapshots. Anyone selling a fully validated “solution” to a problem the buyers themselves only just named publicly is overstating their evidence. What is clear is the direction: AI coding spend will keep climbing, and the teams that win the next quarter will be the ones whose task surface can answer, per item, whether the spend was earned. The bill is here. The board has to catch up.

Frequently asked questions

What is token bill shock?

Token bill shock is the moment an enterprise’s AI coding bill grows large enough, fast enough, that finance and leadership can’t justify it from output alone. In June 2026 it took its first public form: Microsoft reportedly canceling Claude Code in its Experiences and Devices division by June 30 and Uber’s CTO publicly admitting the 2026 AI budget was exhausted by April. The trigger is agentic workflows that multiply token consumption per task by orders of magnitude over chat-style usage.

Did Microsoft cancel all Claude Code use internally?

Per Windows Central’s reporting, the cancellation applies to the Experiences and Devices division — covering Windows, Microsoft 365, Outlook, Teams, and Surface — by June 30, 2026, with engineers redirected to GitHub Copilot CLI. The cited driver is that token costs “made the tool unsustainable” at scale.

How much were Uber’s engineers spending on AI coding tools?

Heavy users were spending $500 to $2,000 per month per engineer on Claude Code and similar agentic tools, according to multiple reports including The Information’s coverage and the Fortune interview with Uber’s COO. After exhausting the company’s 2026 AI tooling budget by April, Uber introduced a $1,500 monthly per-employee cap on agentic coding tools including Claude Code and Cursor.

How can teams control AI agent token spending?

Per-team caps and global limits are necessary but not sufficient — they ration the bill without making each task accountable for the spend it generated. The structural answer is task-level cost attribution: every claimed task records the model tier used, the evidence produced, and the value of the outcome, so the team can see which calls earned their tokens and which didn’t. That instrumentation lives most naturally on a project board, alongside priority and acceptance criteria, rather than in IDE telemetry or chat history.

Why is project management the right place to control AI agent costs?

Because the decision “which model should do this” is downstream of “what is this task worth and how will we verify it.” Both of those questions are project management primitives. When agents claim work through the same surface humans use, the board accumulates the value, model choice, and evidence in one record, and that record is what finance, engineering, and ops can all read. Without it, intelligence routing is a whiteboard policy and the bill is a surprise.

Where can I read the primary sources?

Start with Anissa Gardizy’s X post summarizing The Information’s Uber CTO interview, Fortune’s interview with Uber’s COO for the ROI framing, Windows Central on Microsoft’s Claude Code cancellation, Gartner’s May 2026 AI spending forecast, the FinOps Foundation’s State of FinOps 2026 report for the AI cost-management adoption data, and Andreessen Horowitz’s LLMflation analysis for the per-token deflation curve.

Project management that works the way you think

Lova is a conversation-first workspace. Tell it about your project, it handles the rest — tasks, boards, assignments, and status updates. No setup, no training.

Keep reading