Google AI introduces PaperBana: an agentic framework that automates publication-ready methodology diagrams and statistical plots

Producing publication-ready images is a labor-intensive bottleneck in the research workflow. While AI scientists can now handle literature reviews and code, they struggle to communicate complex discoveries visually. A research team from Google and Peking University have introduced a new framework called “.paper banana‘ which is changing this by using a multi-agent system to automate high-quality educational diagrams and plots.

https://dwzhu-pku.github.io/PaperBanana/

5 specific agents: architecture

paper banana Does not depend on any one signal. It organizes a collaborative team 5 agents To convert raw text into professional visuals.

Step 1: Linear Planning

retriever agent:Identifies 10 The most relevant reference examples from the database to guide style and structure.
planning agent: Translates the technical function text into a detailed text description of the target shape.
stylist agent: Serves as a design consultant to ensure that the output matches the “Newrips look” using a specific color palette and layout.

visualizer agent: Converts description to visual output. For diagrams, it uses image models such as nano-banana-pro. For statistical plots, this executable writes python matplotlib code.
critic agent:Inspects the generated image against the source text to find factual errors or visual glitches. It provides feedback for 3 Rounds of refinement.

Beating the NeurIPS 2025 benchmark

The research team introduced paper bananabencha dataset of 292 Curated test cases from real NeuroIPS 2025 Publication. using a VLM as judge approach, he compared paper banana Against leading baselines.

metric	improvement over baseline
overall score	+17.0%
summary	+37.2%
readability	+12.9%
aesthetics	+6.6%
devotion	+2.8%

System excels in ‘agent and reasoning’ diagrams, achieves a feat 69.9% overall score. It also offers an automatic ‘aesthetic guideline’ that prioritizes ‘soft tech pastels’ over harsh primary colors.

Statistical Plot: Code vs Image

Statistical plots require numerical precision that is often lacking in standard image models. paper banana This is solved by having the visualizer agent write the code instead of drawing pixels.

image creation: Excellent in aesthetics but often suffers from ‘numerical hallucinations’ or repetitive elements.
code-based generation:ensures 100% Data fidelity using matplotlib library to render the final plot.

Domain-specific aesthetic preferences in AI research

according to paper banana Style guides, aesthetic choices often change depending on the research domain to match the expectations of different scholarly communities.

research domain	visual’vibe‘	Key Design Elements
agent and logic	Illustrative, descriptive, “friendly”	2D vector robot, human avatar, emoji and “user interface” aesthetics (chat bubble, document icon)
Computer Vision and 3D	spatial, dense, geometric	RGB color coding for camera cones (frutms), ray lines, point clouds and axis correspondences
productive and learning	modular, flow-oriented	3D cuboid for tensor, matrix grid and “zone” strategies using light pastel fill in group logic
Theory and Adaptation	Minimalist, abstract, “textbook”	A restrained grayscale palette with graph nodes (circles), manifolds (planes), and single highlight colors

Comparison of visualization paradigms

For statistical plots, the framework highlights a clear trade-off between using an image generation model (IMG) versus using executable code (coding).

Speciality	Plot via Image Generation (IMG)	Plot through coding (Matplotlib)
aesthetics	generally higher; The storylines seem more “visually appealing”	Professional and standard academic look
Loyalty	lower; “Numerical hallucination” or possibility of element duplication	100% accurate; Strictly represents the raw data provided
readability	High for sparse data but struggles with complex datasets	persistently high; Handles dense or multi-series data without error

key takeaways

Multi-Agent Collaborative Framework: : paper banana is a context-driven system that organizes 5 specialized agents-Retriever, planner, stylist, visualizer, and critic– Transforming raw technical text and captions into publication-quality method diagrams and statistical plots.
dual stage production process: The workflow includes a linear planning stage To retrieve reference examples and set aesthetic guidelines, it is followed by a 3-round iterative refinement loop Where the Critique Agent identifies errors and the Visualizer Agent reconstructs the image for higher accuracy.
better performance on paper bananabench: Evaluated against 292 test cases from NeurIPS 2025, the framework outperformed the vanilla baseline Total Score (+17.0%), Brevity (+37.2%), Readability (+12.9%)And Aesthetics (+6.6%).
Precision-centered statistical plot: For statistical data, the system switches from direct image generation to executable python matplotlib code; This hybrid approach ensures numerical accuracy and eliminates “hallucinations” common in standard AI image generators.

check it out paper And repo. Also, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

Google AI introduces PaperBana: an agentic framework that automates publication-ready methodology diagrams and statistical plots

5 specific agents: architecture

Step 1: Linear Planning

Step 2: Iterative Refinement

Beating the NeurIPS 2025 benchmark

Statistical Plot: Code vs Image

Domain-specific aesthetic preferences in AI research

Comparison of visualization paradigms

key takeaways

Microsoft’s AI efforts are failing

Databricks Named a Leader in IDC MarketScape: Worldwide Integrated AI Governance Platforms 2025-2026 Vendor Assessment

Related Articles

Leave a Comment Cancel Reply