Anthropic AI Releases Bloom: An Open-Source Agentic Framework for Automated Behavioral Evaluation of Frontier AI Models

by
0 comments
Anthropic AI Releases Bloom: An Open-Source Agentic Framework for Automated Behavioral Evaluation of Frontier AI Models

Anthropic has released Bloom, an open source agentic framework that automates behavioral assessment for frontier AI models. The system takes a researcher’s specified behavior and creates targeted assessments that measure how often and how strongly that behavior appears in realistic scenarios.

Why bloom?

Behavioral assessments for security and alignment are expensive to design and maintain. Teams must assign creative scenarios, run multiple interactions, read long transcripts, and collect scores. As models evolve, old benchmarks may become obsolete or leak into the training data. Anthropic’s research team framed this as a scalability problem; they needed a way to rapidly generate new valuations for misbehavior while keeping the metrics meaningful.

Bloom targets this gap. Instead of a fixed benchmark with a small set of signals, Bloom develops an evaluation suite from a seed configuration. The seed determines which behavior to study, how many scenarios to generate, and which interaction style to use. The framework then generates new but behaviorally consistent scenarios on each run, while allowing reproducibility through recorded seeds.

https://www.anthropic.com/research/bloom

Seed configuration and system design

Bloom is implemented as a Python pipeline and is released under the MIT license on GitHub. The main input is the evaluation “seed”, defined in seed.yamlThis file references a behavior key behaviors/behaviors.jsonOptional example transcripts and global parameters that shape the entire run.

Key configuration elements include:

  • behaviorA unique identifier defined in behaviors.json For targeted behavior, for example sycophancy or self-protection
  • examplesZero or more stored under some shot transcript behaviors/examples/
  • total_evalsNumber of rollouts to generate in the suite
  • rollout.targetThe model under evaluation e.g. claude-sonnet-4
  • control like diversity, max_turns, modalityReasoning Effort and Additional Decision Virtues

Bloom models use LightLLM as a backend for API calls and can talk to Anthropic and OpenAI models through a single interface. It integrates with Weights and Biases for large-scale sweeps and exports and inspects compatible transcripts.

four stage agent pipeline

Bloom’s evaluation process is organized into four agent stages that run in sequence:

  1. understanding agent: This agent reads behavior descriptions and example conversations. It creates a structured summary of what counts as a positive example of behavior and why that behavior matters. It attributes specific durations in examples to successful behavioral demonstrations so that later stages know what to look for.
  2. thought agent: The deliberation phase generates candidate evaluation scenarios. Each scenario describes a situation, user personas, devices the target model can access, and what a successful rollout looks like. Bloom batches scenario generation to efficiently use the token budget and uses the diversity parameter to trade off between more specific scenarios and more variations per scenario.
  3. rollout agent:The Rollout Agent instantiates these scenarios with the target model. It can run multi-turn conversations or simulated environments, and it records all messages and tool calls. configuration parameters such as max_turns, modality And no_user_mode Control how autonomous the target model is during this phase.
  4. Decision and Meta Decision Agents: A judge scores each transcript for the presence of model behavior on a numerical scale and can also rate additional qualities such as realism or the forcefulness of the evaluator. A Meta Judge then reads a summary of all rollouts and produces a suite level report that highlights the most important cases and patterns. The key metric is an elicitation rate, the portion of the rollout that scores at least 7 out of 10 for behavioral presence.

Validation on marginal models

Anthropic used Bloom to build Four Alignment Contextual Assessment SuiteFor delusional sycophancy, directed at long-term sabotage, self-preservation and self-preference bias. Each suite consists of 100 different rollouts and is repeated three times across 16 frontier models. Reported plots show stimulation rates with standard deviation error bars using Cloud Opus 4.1 as the evaluator across all phases.

Bloom’s tests have also been conducted on deliberately misaligned ‘model organisms’ from earlier alignment work. Of the 10 deviant behaviors, Bloom finds the organism diverging from the basic production model in 9 cases. In the remaining self-propagating oddity, manual inspection shows that the baseline model exhibits similar behavior frequency, which explains the overlap in scores. A separate validation exercise compared human labels on 40 transcripts against 11 candidate judge models. Cloud Opus 4.1 reaches a Spearman correlation of 0.86 with human scores, and Cloud Sonnet 4.5 reaches 0.75, with particularly strong consensus at high and low scores where thresholds matter.

https://alignment.anthropic.com/2025/bloom-auto-evals/

Petrie and relation to positioning

Anthropic sees Bloom as a complement to Petrie. Petri is a broad coverage auditing tool that takes seed instructions describing multiple scenarios and behaviors, then uses automated agents to calibrate the model through multi turn interactions and summarize various security relevant dimensions. Instead, Bloom starts with a behavior definition and automates the engineering necessary to transform it into a large, targeted assessment suite with quantitative metrics like elicitation rate.

key takeaways

  • Bloom is an open source agentic framework that transforms a single behavior specification into a full behavior assessment suite for large models using a four-stage pipeline of understanding, ideation, rollout, and decision.
  • This system operates by seed configuration seed.yaml And behaviors/behaviors.jsonWhere researchers specify target behavior, example transcripts, total evaluation, rollout model and controls such as variety, maximum turns, and modalities.
  • Bloom relies on LightLLM for integrated access to Anthropic and OpenAI models, integrates with Weights and Biases for experiment tracking and exporting, inspects compatible JSON, and an interactive viewer to inspect transcripts and scores.
  • Anthropic replicated Bloom 3 times with 100 rollouts on 4 alignment-centric behaviors in 16 marginal models, and validated on 10 model organism quirks, where Bloom distinguishes intentionally misaligned organisms from the baseline model in 9 cases and the judge model matches human labels with a Spearman correlation up to 0.86.

check it out github repo, technical Report And blogAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletterwait! Are you on Telegram? Now you can also connect with us on Telegram.


Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

Related Articles

Leave a Comment