TITLE: Google Launches Gemini 3.1 Flash-Lite at $0.25 Per Million Tokens
DATE: 2026-04-02
COMPANY: Google
TOPIC: Model Releases

SUMMARY: Google has released Gemini 3.1 Flash-Lite, its most cost-efficient AI model to date, priced at $0.25 per million input tokens, one-eighth the cost of Gemini 3.1 Pro. The model delivers 2.5 times faster responses and 45% higher output speeds than its predecessor, while supporting a one-million-token context window and multimodal inputs including text, images, audio, video, and PDFs. For operators running high-volume AI workflows, the pricing shift opens use cases that were previously too expensive to sustain.

WHAT CHANGED:
Google released Gemini 3.1 Flash-Lite in preview on 3 March 2026, completing a tiered model strategy launched alongside Gemini 3.1 Pro in February. Flash-Lite sits at the efficiency end of the range, designed for high-volume workloads where cost and speed take priority over maximum capability.

The pricing is the headline: $0.25 per million input tokens and $1.50 per million output tokens. For context, that is one-eighth the cost of Gemini 3.1 Pro and below the previous generation Gemini 2.5 Flash. Competing budget models from Anthropic (Claude 4.5 Haiku at $1/M input) and OpenAI (GPT-5 mini) are priced higher for input, making Flash-Lite the most affordable option among frontier-adjacent models at launch.

Despite the lower price, the performance is competitive. Flash-Lite achieved the top score across six of eleven benchmark tests in independent evaluations, outperforming GPT-5 mini and Claude 4.5 Haiku. On the Arena.ai leaderboard it holds an Elo score of 1,432. It scores 86.9% on GPQA Diamond and 76.8% on MMMU Pro, both results that exceed what larger Gemini models from previous generations achieved.

The model uses a mixture-of-experts (MoE) architecture, activating only a subset of its parameters per inference call. This is the same structural approach as Gemini 3.1 Pro, which means Flash-Lite benefits from a large training base while keeping per-inference compute costs low. The result is performance that exceeds its price tier more consistently than previous budget models managed.

Developers can control the model's reasoning depth through four thinking modes: minimal, low, medium, and high. This allows operators to balance response quality against cost and latency depending on the task. The one-million-token context window is available at all thinking levels, meaning document-heavy workflows do not require chunking or pre-processing.

WHY IT MATTERS:
At $0.25 per million input tokens, operators can now run AI across millions of documents or customer interactions per month at a cost that fits inside existing operational budgets
The one-million-token context window eliminates the chunking problem for large documents, contracts, audio transcripts, and historical data, making these workflows practical without custom engineering
Multimodal support at this price point means a single model can process mixed content, text alongside images, audio, or PDFs, reducing the number of different tools an operator needs to manage
The speed improvement (225 tokens per second, 2.5 times faster than predecessor) reduces latency in real-time applications like customer-facing chat, automated email responses, and live document analysis
Budget model performance catching up to previous-generation frontier models shifts the decision calculus: operators no longer need to choose between quality and cost at the same rate they did 12 months ago
Availability on both Google AI Studio and Vertex AI means operators can access Flash-Lite through Google's consumer developer tools or its enterprise-grade platform with compliance and access controls

DAVID & GOLIATH ANALYSIS:
The release of Gemini 3.1 Flash-Lite matters because it changes the economics of what is worth automating. Twelve months ago, running AI across a large document library, a year of customer emails, or thousands of product images required either significant API budget or a willingness to accept lower-quality models. At $0.25 per million tokens with frontier-adjacent performance, that trade-off has collapsed.

For operators running businesses with 10 to 200 people, this is not an incremental improvement. It is a genuine capability shift. A workflow that processes 10 million tokens per month, roughly the equivalent of reading thousands of customer contracts or generating personalised outreach at meaningful scale, now costs $2.50 in input processing. The barrier to AI-powered operations is no longer price. It is workflow design and implementation.

The practical implication is straightforward: operators should revisit every AI use case they dismissed in the past 18 months because the economics did not stack up. Many of those decisions were correct at the time and are now wrong. The operators who move quickly to identify and implement the newly viable workflows will compound advantages over the next 12 months that will be difficult for slower movers to close.

Start with your highest-volume, most repetitive knowledge work. Calculate what it currently costs in staff time. Run the numbers at $0.25 per million tokens. The business case will often be obvious.

RELEVANT SYSTEMS: AI Growth Engine, Employee Amplification Systems

SOURCE URL: https://davidandgoliath.ai/daily-ai-briefing/google-gemini-31-flash-lite-025-per-million-tokens
FEED URL: https://davidandgoliath.ai/daily-ai-briefing/feed

---

Published by David & Goliath | https://davidandgoliath.ai
Daily AI Briefing: one AI development per day, decoded for business operators.
This is a structured companion file optimised for LLM retrieval and citation.