Google Gemini 2.5 Flash launched with reasoning AI, low cost

By Swaleha | Published on April 19, 2025

Technology / April 19, 2025

Google Gemini 2.5 Flash launched with reasoning AI, low cost

Google has launched Gemini 2.5 Flash, a fast, low-cost AI model with hybrid reasoning abilities. The model allows developers to control how much compute is used for each query. It performs better than its predecessor and comes close to top models like GPT-4.5 and o4-mini in benchmarks.

The preview is now live on Google AI Studio and Vertex AI, the company’s developer and cloud platforms. While Pro is designed for power users and research-grade tasks, Flash aims to be the more affordable, everyday version. Still, it packs in smart features that set it apart from its predecessor, Gemini 2.0 Flash.

Google has officially rolled out Gemini 2.5 Flash in preview mode, offering developers and early adopters a chance to test its newest lightweight AI model. Released just weeks after the Gemini 2.5 Pro, this new version focuses on low-latency performance, while introducing advanced reasoning features that users can adjust based on task complexity or budget.

What is Gemini 2.5 Flash and why it matters

This flexibility matters for a few reasons. First, reasoning consumes more tokens, which directly impacts cost and response time. According to Google, the price per million tokens with reasoning switched off is $0.15 for input and $0.60 for output. But with reasoning turned on, that shoots up to $3.50 per million tokens. That’s roughly ₹305 per million output tokens for Indian developers.

At its core, Gemini 2.5 Flash is Google’s newest hybrid reasoning model. That means it can “think” before it responds—but what’s interesting is that the amount of this “thinking” is now adjustable. Google calls it a “thinking budget,” giving developers full control over how much reasoning compute the model should use per query.

Performance

Google says the model can auto-detect how complex a question is and adjust its compute accordingly. Asking for a translation of “thank you” in Hindi won’t cost much reasoning, but designing a full daily planner or solving a probability problem will.

Gemini 2.5 Flash scored 12.1% on Humanity’s Last Exam, a benchmark meant to challenge models across fields like math, science, and humanities. That’s more than double the 5.1% scored by Flash 2.0 and places it ahead of rivals like Claude 3.7 Sonnet and DeepSeek R1, though still behind OpenAI’s latest o4-mini.

Unlike the earlier Flash 2.0, which was known mostly for speed, 2.5 Flash steps up in terms of capability without compromising latency. For example, it can now handle complex tasks like multi-step math, coding functions in Python, and even helping users build entire games.

How does it compare?

The timing of this release has raised eyebrows—it dropped just a day after OpenAI launched o3 and o4-mini. And both models are now battling for top spots on the LMArena leaderboard. Flash 2.5 currently sits just behind Pro in performance, according to Google.

While Flash doesn’t yet have all the tool integrations that OpenAI’s o-series offers—like web browsing or file analysis—it’s clearly a step forward for cost-aware developers who want flexibility without sacrificing performance.

Gemini 2.5 Flash supports a one-million-token context window and can handle multiple input formats, including text, image, audio, and video. Its knowledge cutoff is January 2025.