Google DeepMind researchers release Gemma Scope 2 as full stack interpretability suite for Gemma 3 models

by
0 comments
Google DeepMind researchers release Gemma Scope 2 as full stack interpretability suite for Gemma 3 models

Google DeepMind researchers have introduced Gemma Scope 2, an open suite of interpretive tools that describe how Gemma 3 language models process and present information across all layers, from 270M to 27B parameters.

Its main goal is simple, to give AI security and alignment teams a practical way to trace model behavior on internal features rather than relying solely on input output analysis. When a Gemma 3 model jailbreaks, hallucinates, or exhibits sycophantic behavior, Gemma Scope 2 lets researchers observe which internal features were activated and how those activations flowed through the network.

What is Gemma Scope 2?

Gemma Scope 2 is a comprehensive, open suite of sparse autoencoders and related tools trained on intrinsic activations of the Gemma 3 model family. Sparse autoencoders act as microscopes on SAE,models. They decompose high-dimensional activations into a sparse set of human observable attributes that correspond to concepts or behaviors.

Training Gemma Scope 2 required storing approximately 110 petabytes of activation data and fitting more than 1 trillion total parameters across all explanatory models.

The suite targets every Gemma 3 variant, including 270M, 1B, 4B, 12B and 27B parameter models, and covers the full depth of the network. This is important because many security-related behaviors are only visible at large scales.

What’s new compared to the original Gemma Scope?,

The first Gemma Scope release focused on Gemma 2 and already enabled research on model hallucination, identifying secrets known by a model and training safer models.

Gemma Scope 2 extends that work in four main ways:

  1. The instruments now span the entire Gemma 3 family to 27B parameters, which is necessary to study emerging behaviors observed only in larger models, such as the behavior previously analyzed in 27B size C2S scale models for scientific discovery tasks.
  2. Gemma Scope 2 includes SAEs and codecs trained at every layer of Gemma 3. Skip codecs and cross layer codecs help address multi step calculations that are distributed across layers.
  3. The suite applies matryoshka training techniques to enable SAEs to learn more useful and stable features and mitigate some of the shortcomings identified in earlier Gemma Scope releases.
  4. There are dedicated interpretive tools for Gemma 3 models for chat, which make it possible to analyze multi-step behaviors such as jailbreak, denial mechanisms and chain of thought fidelity.

key takeaways

  1. Gemma Scope 2 is an open interpretable suite for all Gemma 3 models ranging from 270M to 27B parameters, with SAE and codecs at each layer of both pre-trained and instruction tuned variants.
  2. The suite uses sparse autoencoders as a microscope that decompose internal activations into sparse, concept-like features, as well as codecs that track how these features propagate across layers.
  3. Gemma Scope 2 is explicitly deployed for AI security work to study jailbreaks, hallucinations, sycophancy, denial mechanisms, and discrepancies between internal state and communicated logic in Gemma 3.

check it out paper, technical details And model weightAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletterwait! Are you on Telegram? Now you can also connect with us on Telegram.


Michael Sutter is a data science professional and holds a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michael excels in transforming complex datasets into actionable insights.

Related Articles

Leave a Comment