TITLE: AI Model Costs Are Collapsing, but Cheaper Is Not Always Cheaper
DATE: 2026-06-07
COMPANY: Alibaba (Qwen)
TOPIC: AI Strategy

SUMMARY: Alibaba's Qwen 3.7 Max has landed at fourth on the Code Arena WebDev leaderboard while charging roughly a third of Claude Opus 4.7's headline price. Combined with Microsoft's new in-house MAI models and Google's Gemini 3.5 Flash, the message for operators is clear: frontier-grade capability is getting dramatically cheaper. The catch is that headline token prices no longer tell you the real cost of getting work done.

WHAT CHANGED:
The competitive picture for AI models shifted again this week, and the headline is price.

Alibaba released Qwen 3.7 Max, a flagship model priced at roughly 2.50 US dollars per million input tokens and 7.50 US dollars per million output tokens. That is about a third of the headline price of Anthropic's Claude Opus 4.7, which charges in the order of 5 US dollars per million input tokens and 25 US dollars per million output tokens. Qwen 3.7 Max landed at fourth on the Code Arena WebDev leaderboard and ships with a one million token context window and a drop-in compatible API, lowering the switching friction for teams already building on a major provider.

Microsoft added to the pressure at Build 2026, held from 2 to 4 June, where it announced seven in-house MAI models, including MAI-Code-1-Flash for code generation, MAI-Thinking-1 for reasoning, and MAI-Transcribe-1.5 for transcription. Microsoft positioned the family as a way to reduce its reliance on OpenAI and lower costs for developers building on its platform, signalling that even the largest commercial AI distributor wants model optionality.

Google continued the trend on the consumer side, with Gemini 3.5 Flash, shipped at Google I/O 2026, now serving as the default model in the Gemini app and in AI Mode in Search.

The important nuance sits beneath the sticker price. In published evaluations, Qwen 3.7 Max generated around 97 million output tokens against a median of about 24 million for comparable frontier models on the same tasks, roughly four times the verbosity. Because usage is billed per output token, that verbosity pushes the effective cost per completed task back toward the premium tier for many workflows. On coding specifically, Claude Opus 4.7 retained a clear lead, reported at 11.5 points ahead on SWE-bench Pro.

WHY IT MATTERS:
Frontier-grade capability is no longer the preserve of one or two vendors, which gives operators real negotiating and routing options
Headline token prices are diverging from real cost per completed task, so spreadsheet comparisons based on list price can mislead
Verbose models can erase their own price advantage on high-volume workloads, where output token count compounds quickly
Model-agnostic architecture is becoming a mainstream strategy, validated by Microsoft running its own models alongside OpenAI
Falling costs lower the ROI threshold for automation, putting previously uneconomic workflows back on the table
Switching friction is dropping as challengers ship compatible APIs and large context windows, making bake-offs faster to run

DAVID & GOLIATH ANALYSIS:
For a lean organisation, this is good news that needs a steady hand. The instinct when a model appears at a third of the price is to switch and bank the saving. That instinct is often wrong, because the number that matters is not the price per token. It is the cost to get a real task finished to an acceptable standard, including the retries, the human edits, and the jobs the model gets wrong.

A model that is cheaper on paper but four times as verbose, or that needs a second attempt one time in five, can cost more in practice than the premium model it replaced. The leaders in coding benchmarks still hold a meaningful edge on the hardest work, so the answer is rarely to standardise on the cheapest option across the board. It is to match the model to the job. Premium models for the high-stakes, low-volume work where reliability pays for itself. Cheaper and specialist models for the routine, high-volume work where good enough is genuinely good enough.

Run a one week bake-off on your own tasks before you move anything in production, and measure cost per completed workflow rather than cost per token. Then build your systems so you can change the model behind a workflow without rebuilding the workflow. The price war will continue, and the operators who can route work to the right model at the right cost, and switch when the market moves, will compound that advantage every quarter.

RELEVANT SYSTEMS: AI Growth Engine, Employee Amplification Systems, Secure AI Brain

SOURCE URL: https://davidandgoliath.ai/daily-ai-briefing/ai-model-costs-collapsing-cheaper-not-always-cheaper
FEED URL: https://davidandgoliath.ai/daily-ai-briefing/feed

---

Published by David & Goliath | https://davidandgoliath.ai
Daily AI Briefing: one AI development per day, decoded for business operators.
This is a structured companion file optimised for LLM retrieval and citation.