Google's Gemma 4 Changes Everything for Open Source AI

SameApache 2.0. Runs on your laptop. Beats models 20x its size. This is not a drill.

Sumit Pandey

Towards Deep Learning

· ~11 min read · April 3, 2026 (Updated: April 10, 2026) · Free: No

I'll be honest. I stopped paying attention to Gemma after version 2. Not because it was bad. It just never felt like a serious contender against the Chinese open source juggernauts: DeepSeek, Qwen, the models that developers actually deployed. Gemma was the model you tried once on Kaggle and then forgot about. Today, Google changed that. Completely.

Gemma 4 dropped on April 2, 2026. And the Hugging Face CTO, Julien Chaumond, posted about it with literal fire emojis calling it "BREAKING NEWS." When the CTO of the platform that hosts every open model on earth says Google just re-entered the game: you pay attention.

If you cant read the article further because of paywall please click here

Let me break down what happened, why it matters, and whether the benchmarks actually hold up.

What Is Gemma 4?

Gemma 4 is Google DeepMind's latest family of open-weight models. Built from the same research and technology behind Gemini 3: their proprietary frontier model.

Four model sizes. Four deployment targets:

E2B (Effective 2B parameters): Runs on phones, Raspberry Pi, Jetson Nano. Yes, seriously.
E4B (Effective 4B parameters): Slightly larger edge model. Still fits on a phone.
26B MoE (Mixture of Experts, 4B active): Only activates 3.8B parameters during inference despite having 25.2B total. Runs almost as fast as a 4B model.
31B Dense: The flagship. Currently ranked #3 among all open models on the Arena AI text leaderboard.

All four models process images and video. The smaller E2B and E4B models also handle native audio input: speech recognition directly on device, no cloud needed. Context windows go up to 128K tokens for edge models and 256K for the larger ones. That is an entire codebase in a single prompt.

The Benchmark Numbers

Here is where it gets real. These are from the official Gemma 4 model card, instruction-tuned variants:

Text Benchmarks

Vision Benchmarks

Long Context

Let me highlight the numbers that matter most.

AIME 2026: 89.2% for the 31B model. That is a math competition benchmark. Gemma 3 27B scored 20.8%. That is a 4x improvement in one generation.
Codeforces ELO: 2150 for the 31B. For context: Gemma 3 scored 110. That is not a typo. The coding capability jump is staggering.
LiveCodeBench v6: 80.0% versus 29.1% for Gemma 3. Nearly tripled.
GPQA Diamond: 84.3%. These are PhD-level science questions where human experts score around 65%.

The MoE model is particularly interesting. At 26B total parameters with only 3.8B active during inference, it scores 82.3% on GPQA Diamond. It runs at near 4B-model speeds while delivering near 31B-model intelligence.

How Does Gemma 4 Compare to the Frontier?

Numbers in isolation mean nothing. So let me put Gemma 4 side by side with the proprietary models everyone is actually using: Claude Opus 4.6, GPT-5.2, and the open-weight giant Kimi K2.5.

image is from the original blog

Important caveat before we dive in: this is not a perfectly apples-to-apples comparison. Gemma 4 31B has 31 billion parameters. Claude Opus 4.6 and GPT-5.2 are proprietary models with undisclosed parameter counts, almost certainly hundreds of billions or more. Kimi K2.5 has 1 trillion total parameters (32B active). The fact that Gemma 4 is even in the same conversation as these models is the story.

Reasoning: GPQA Diamond (PhD-Level Science)

Gemma 4 31B scores 84.3% on PhD-level science reasoning. That is behind Claude Opus 4.6 (91.3%) and GPT-5.2 (92.4%) by about 7–8 points. But here is the thing: those are massive proprietary models running on server farms. Gemma 4 runs on your laptop. And it beats Claude Sonnet 4.6 (74.1%), a model many developers use daily, by over 10 points.

Math: AIME (Competition Mathematics)

Note: Gemma 4 reports on AIME 2026 (harder problem set), while most other models were evaluated on AIME 2025. Even accounting for this, 89.2% on competition math from a 31B model is remarkable. GPT-5.2 and Claude Opus 4.6 achieve near-perfect scores, but they are proprietary frontier models with orders of magnitude more compute. Kimi K2.5 scores 95.8% but with 1 trillion total parameters.

Knowledge: MMLU Pro (Graduate-Level Questions)

Gemma 4 31B scores 85.2% on MMLU Pro. That puts it within striking distance of Kimi K2.5 (87.1%) despite having 30x fewer total parameters. It also appears competitive with or slightly ahead of Claude Opus 4.6's reported MMLU Pro score of approximately 82%.

Coding: SWE-Bench & LiveCodeBench

Gemma 4 reports 80.0% on LiveCodeBench v6 and a Codeforces ELO of 2150. While direct SWE-Bench Verified numbers are not available yet, the coding performance is clearly frontier-competitive. The LiveCodeBench score puts the 31B model in the same tier as Kimi K2.5, which again has 32x more total parameters.

Vision: MMMU Pro (Multimodal Reasoning)

On multimodal visual reasoning, Gemma 4 31B is essentially neck and neck with Claude Sonnet 4.6. For a 31B parameter open model, this is exceptional.

The Intelligence-Per-Parameter Story

Here is the table that tells the real story. This is about efficiency:

Look at the 26B MoE model specifically. It activates only 3.8 billion parameters per token. That is roughly the compute footprint of a small model. And it scores 82.3% on GPQA Diamond and 82.6% on MMLU Pro.

Kimi K2.5 activates 32B parameters per token with 1 trillion total, and gets 87.6% GPQA / 87.1% MMLU Pro. That is roughly 5 points higher while activating 8x more parameters per inference step and needing an order of magnitude more storage.

The proprietary models still win on absolute scores. That is expected. But the gap is shrinking fast, and the deployment economics of Gemma 4 are in a completely different universe. No API costs. No data leaving your machine. No vendor lock-in.

The Honest Bottom Line

Gemma 4 does not beat Claude Opus 4.6 or GPT-5.2 on raw benchmarks. Anyone claiming otherwise is lying to you.

But that is not the right comparison. The right comparison is: what is the best model I can run locally, on my own hardware, under a fully permissive license, with zero API costs?

And on that question, Gemma 4 is a very strong contender. It trades 7–8 points on GPQA Diamond and roughly 4–5 points on MMLU Pro versus the best proprietary models for something those models can never offer: complete ownership and zero marginal cost per inference.

For many real-world applications, that trade-off is not just acceptable. It is preferable.

Why Apache 2.0 Changes Everything

Previous Gemma models shipped under Google's custom Gemma license. It was permissive, sure. But it was not truly open source.

Gemma 4 ships under Apache 2.0. The same license as Kubernetes, TensorFlow, and Apache Spark.

This is a massive deal. Hugging Face co-founder Clément Delangue called it "a huge milestone." No usage restrictions. No reporting requirements. Full commercial use. Fork it, fine-tune it, deploy it however you want.

For startups and enterprises building AI products: this removes one of the biggest friction points with Gemma adoption. You own your model. You own your data. You own your deployment.

The Real Story: Google vs. China in Open Source AI

Let me give you the strategic context. Look at the Arena AI open model leaderboard before today. The top spots were dominated by Chinese models: DeepSeek, Qwen, and their derivatives. The US open source presence was largely Meta's Llama and Nvidia's Nemotron.

Google's Gemma series had 400 million downloads. Over 100,000 community variants. But in actual deployment (OpenRouter usage data tells this story), Gemma consistently lagged behind Llama and DeepSeek.

Gemma 4 is Google's answer. The 31B model now sits at #3 on the Arena AI leaderboard. The 26B MoE sits at #6. Both outperform models that are 20 times their size. This is not just a model release. This is Google saying: we are competing for the open source AI ecosystem. Seriously this time.

Running It Locally

This is where it gets practical. Here is how to run Gemma 4 on your own hardware today:

First, upgrade llama.cpp:

brew upgrade llama.cpp
# or install from HEAD if the latest build isn't available yet:
brew install llama.cpp --HEAD

If you have 16GB of RAM/VRAM (MacBook, most laptops):

llama-server -hf ggml-org/gemma-4-E4B-it-GGUF:Q8_0

If you have 24GB+ of RAM/VRAM (MacBook Pro, RTX 3090):

llama-server -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M

If you have 32GB VRAM (RTX 5090):

llama-server -hf ggml-org/gemma-4-26B-A4B-it-GGUF:Q8_0

The 31B Dense model's unquantized weights fit on a single 80GB NVIDIA H100. Quantized versions run on consumer GPUs.

Day-one support across the ecosystem: Hugging Face Transformers, vLLM, llama.cpp, MLX, Ollama, LM Studio, Unsloth, SGLang, NVIDIA NIM, and more.

What Makes Gemma 4 Architecturally Different

A few design choices stand out:

Per-Layer Embeddings (PLE): The E2B and E4B models use a clever trick. Instead of making the model wider or deeper, each decoder layer gets its own small embedding table. These tables are large but only used for quick lookups. So the "effective" parameter count (what actually runs during inference) is much smaller than the total parameter count. The E2B has 5.1B total parameters but only 2.3B effective.
Hybrid Attention: All models interleave local sliding window attention with full global attention. Local windows keep inference fast. Global attention layers (always including the final layer) maintain deep understanding across long contexts.
Mixture of Experts Done Right: The 26B MoE uses 128 total experts with 8 active per token, plus 1 shared expert. Only 3.8B parameters activate during inference. This gives you 26B-class intelligence at 4B-class speed.
Native Function Calling: Not an afterthought. Gemma 4 supports structured JSON output and function calling natively. This is critical for building agents that interact with external tools and APIs.
Configurable Thinking Mode: All models support a built-in reasoning mode. Add <|think|> to the system prompt and the model generates step-by-step reasoning before the final answer. Disable it for faster responses when you don't need deep reasoning.

The Gemmaverse Is Real

The numbers speak for themselves. 400 million downloads. 100,000+ community variants. Specialized derivatives like:

MedGemma: Medical imaging and clinical report generation.
DolphinGemma: Dolphin vocalization analysis.SignGemma: Sign language translation.

A research team even trained Gemma 4 to drive in the CARLA simulator using multimodal tool responses: the model sees the road through a camera, decides what to do, and learns from the outcome. This is what a healthy open source ecosystem looks like. The base model is good enough that people build genuinely novel things on top of it.

My Honest Take

I started this article as a skeptic. I have been burned by Google's open model promises before. But the benchmark improvements here are not incremental. Going from a Codeforces ELO of 110 to 2150 in one generation is unprecedented. The AIME score jumping from 20.8% to 89.2% is not marketing fluff: that is a fundamentally different model.

The Apache 2.0 license removes my biggest objection. The hardware requirements are reasonable. The ecosystem support is comprehensive from day one. Is it the best open model in the world? The 31B is #3 on Arena AI. It is not #1. DeepSeek and Qwen still have strong offerings. But Google is now genuinely competitive in this space.

For anyone building local-first AI applications, agentic workflows, or on-device intelligence: Gemma 4 deserves a serious look. Especially the 26B MoE. That model is the sleeper hit of this release. The open source AI war just got a lot more interesting.

References

Google DeepMind. "Gemma 4: Byte for byte, the most capable open models." DeepMind Blog, April 2, 2026. https://deepmind.google/blog/gemma-4-byte-for-byte-the-most-capable-open-models/
Google AI for Developers. "Gemma 4 Model Card." April 2, 2026. https://ai.google.dev/gemma/docs/core/model_card_4
Hugging Face. "Welcome Gemma 4: Frontier multimodal intelligence on device." April 2, 2026. https://huggingface.co/blog/gemma4
9to5Google. "Google announces open Gemma 4 model with Apache 2.0 license." April 2, 2026. https://9to5google.com/2026/04/02/google-gemma-4/
SiliconANGLE. "Google's new Gemma 4 models bring complex reasoning skills to low-power devices." April 2, 2026. https://siliconangle.com/2026/04/02/googles-new-gemma-4-models-bring-complex-reasoning-skills-low-power-devices/
Engadget. "Google releases Gemma 4, a family of open models built off of Gemini 3." April 2, 2026. https://www.engadget.com/ai/google-releases-gemma-4-a-family-of-open-models-built-off-of-gemini-3-160000332.html
OfficeChai. "Google Releases Gemma 4 Open Models, Calls Them 'Best In World' In Their Category." April 2, 2026. https://officechai.com/ai/google-releases-gemma-4-open-models-calls-them-best-in-world-in-their-category/
Arm Newsroom. "Gemma 4 on Arm: Accessible, immediate, optimized on-device AI." April 2, 2026. https://newsroom.arm.com/blog/gemma-4-on-arm-optimized-on-device-ai
Constellation Research. "Google launches Gemma 4 open-source LLM family." April 2, 2026. https://www.constellationr.com/insights/news/google-launches-gemma-4-open-source-llm-family
OpenAI. "Introducing GPT-5.2." December 2025. https://openai.com/index/introducing-gpt-5-2/
Vellum AI. "GPT-5.2 Benchmarks (Explained)." December 2025. https://www.vellum.ai/blog/gpt-5-2-benchmarks
NxCode. "Claude Opus 4.6 vs Sonnet 4.6: Complete Comparison Guide." March 2026. https://www.nxcode.io/resources/news/claude-sonnet-4-6-vs-opus-4-6-complete-comparison-2026
AI Tools Review. "Claude Opus 4.6 Review: Benchmarks & Rankings." March 2026. https://aitoolsreview.co.uk/insights/claude-opus-4-6-deep-dive
Maxime Labonne / Hugging Face. "Kimi K2.5: Still Worth It After Two Weeks?" February 2026. https://huggingface.co/blog/mlabonne/kimik25
VERTU. "Open Source LLM Leaderboard 2026: Rankings, Benchmarks & the Best Models Right Now." February 2026. https://legacy.vertu.com/lifestyle/open-source-llm-leaderboard-2026-rankings-benchmarks-the-best-models-right-now/
Artificial Analysis. "MMLU-Pro Benchmark Leaderboard." March 2026. https://artificialanalysis.ai/evaluations/mmlu-pro
PricePerToken. "GPQA Leaderboard 2026." March 2026. https://pricepertoken.com/leaderboards/benchmark/gpqa

If you enjoyed this breakdown, follow me for more no-BS coverage of AI/ML releases. I write from the perspective of someone who actually builds with these models, not just benchmarks them.

#artificial-intelligence #large-language-models #google #deep-learning #ai