Google forced to make costly AI fix after companies crashout
It is only May, and some of the world's largest companies have already burned through their entire annual AI budget. That is not a projection. That is what enterprise CIOs are telling each other right now.
Google heard that conversation and decided to respond publicly, at its biggest event of the year.
What Google announced and why the timing was deliberate
At Google I/O on May 20, Alphabet CEO Sundar Pichai announced the launch of Gemini 3.5 Flash, a faster and cheaper AI model built specifically for enterprise workloads, according to Google's blog.
The message Pichai delivered alongside the launch was unusually direct.
"We've heard that many companies are already blowing through their annual token budgets, and it's only May," he said. "If companies used a mix of Flash and other frontier models, they could save a lot of money."
He then put a number on it. Top companies are now processing approximately one trillion tokens per day on Google Cloud. If those companies shifted 80% of their workloads from expensive frontier models to a blend of Gemini 3.5 Flash and frontier models, they would save more than $1 billion annually, according to VentureBeat.
Why enterprise AI bills are spiraling and what is driving them
The token budget problem is real, and it is accelerating.
Tokens are the units of data that AI models process with every prompt, response, code generation, and agent interaction.
For a single user running occasional queries, the cost is negligible. For an enterprise deploying AI agents across thousands of workflows simultaneously, the arithmetic changes completely.
More AI:
- Micron sits at the center of a red-hot chip rally
- IBM CEO sends blunt message on AI and quantum computing
- Anthropic CEO makes shocking admission about AI
AI agents are the biggest driver. Unlike single-turn chatbot interactions, agents perform complex multi-step tasks that may involve dozens of model calls, extended context windows, and reasoning loops that consume compute at a pace that can surprise even well-prepared finance teams.
Companies that budgeted for AI as a productivity experiment are finding they underestimated the cost of AI as an operational infrastructure.
The problem has become visible enough that it is shaping procurement decisions. Uber burned through its entire 2026 Claude Code and Cursor budget in four months.
Microsoft canceled Claude Code licenses across its Experiences and Devices division, partly due to cost. The enterprise AI bill, which once felt like a rounding error, is now showing up as a material line item on quarterly reviews, according to VentureBeat.
What makes Gemini 3.5 Flash different from other cheaper models
Cheaper AI models are not new. The market has seen a wave of smaller, faster, lower-cost options from every major lab.
What distinguishes Google's pitch is the combination of the model itself with the infrastructure economics behind it.
Pichai told journalists at a pre-briefing that Gemini 3.5 Flash matches the performance of other frontier systems at up to a third of the price, according to Technology.org. The claim is that companies do not have to sacrifice capability meaningfully to achieve the cost reduction.
Google also cut its consumer pricing at the same event. The AI Ultra subscription plan, which gives users access to higher usage limits and the most capable models, dropped from $250 to $200 per month. A new $100 per month tier was added for developers and professional users.
Gemini 3.5 Pro, the more powerful model in the family, is expected to follow next month. Google says it is already using it internally and has seen "great improvements."
How Google's infrastructure gives it a structural cost advantage
The billion-dollar savings claim only works if Google can actually deliver models at lower cost than competitors.
The case for why it can rests on infrastructure. Google designs its own custom silicon, the Tensor Processing Units, and has been scaling that chip program aggressively for years.
The company has committed capital expenditure of up to $190 billion in 2026, up dramatically from $31 billion in 2022, the year ChatGPT launched and the generative AI race began, according to The National.
That scale of infrastructure investment, combined with owning the full stack from chips to cloud to models, gives Google more room to control inference costs than rivals who rent compute from third-party providers.
The analogy Pichai draws is to Google Search. Search succeeded not by being the only search engine, but by being fast, efficient, and cheap enough to serve at global scale. The bet on Gemini 3.5 Flash is that the same economics can apply to AI inference.
Key figures on Google's AI cost push and Gemini 3.5 Flash:
- Token budget crisis: enterprise companies are already exhausting annual AI token budgets by May 2026; top customers processing approximately one trillion tokens per day, according to Google keynote
- Savings estimate: shifting 80% of workloads from frontier models to a Flash and frontier blend could save enterprises more than $1 billion annually, according to VentureBeat
- Pricing moves: AI Ultra subscription cut from $250 to $200 per month; new $100 per month developer tier added; Gemini 3.5 Flash available immediately via APIs and products, according to BusinessWorld
- Performance claim: Gemini 3.5 Flash matches frontier model performance at up to one-third of the price, according to Technology.org
- Google capex: up to $190 billion committed for 2026, up from $31 billion in 2022; dual-chip approach with custom TPUs for training and inference, according to The National
- User growth: Gemini users have more than doubled to approximately 900 million in the past year, The National confirmed
- Coming next: Gemini 3.5 Pro expected next month; Google says internal testing shows "great improvements," according to Google I/O blog
What this means for companies making AI spending decisions right now
Google's announcement arrives at a moment when the AI industry is being forced to answer a question it largely avoided during the early adoption phase: What does this actually cost at scale, and is the return worth it?
For companies currently locked into expensive frontier model contracts, the Gemini 3.5 Flash pitch reframes the decision as a budgeting problem rather than a capability problem. Not every task requires the most powerful model.
Routing routine, high-volume tasks to a cheaper model while reserving frontier compute for complex reasoning is the same logic that cloud architects have applied to compute tiers for years. Google is applying that logic to AI.
The broader implication is that AI cost management is becoming a discipline in its own right. The companies that figure out how to match model capability to task complexity will have a structural cost advantage over those that deploy the most powerful model for everything.
Google is positioning Gemini 3.5 Flash as the tool that makes that optimization possible. Whether the savings materialize as advertised depends on how disciplined enterprises are willing to be in rearchitecting their AI workflows around cost rather than capability.
Related: Wells Fargo revamps Google stock price target after major AI move
The Arena Media Brands, LLC THESTREET is a registered trademark of TheStreet, Inc.
This story was originally published May 30, 2026 at 11:03 AM.