Designing Effective Multi-Agent Architectures - O'Reilly

Papers on agentic and multi-agent systems (MAS) increased from 820 in 2024 to more than 2,500 in 2025. This surge shows that MAS is now a primary focus for the world’s top research laboratories and universities. Yet there is a disconnect: while research is progressing rapidly, these systems still often fail when production is affected. Most teams instinctively try to fix these failures with better signals. i use the word induce confusion To describe the belief that only models and rapid change can fix systemic coordination failures. You cannot suggest a way out of a system-level failure. If your agents are consistently performing poorly, the problem probably isn’t with the wording of the instructions; This is the architecture of collaboration.

Beyond the signaling fallacy: general cooperation patterns.

Certain coordination patterns stabilize the system. Others magnify failure. There is no universal best pattern, there are only patterns that are appropriate for the task and the way information needs to flow. The following provides a quick orientation to common collaboration patterns and when they work well.

observer-based architecture

a linear, observer-based architecture The most common starting point is. A central agent plans, assigns tasks, and decides when the task will be completed. This setup can be effective for tightly scoped, sequential logic problems such as financial analysis, compliance checks, or step-by-step decision pipelines. The strength of this pattern is control. The weakness is that every decision becomes an obstacle. As tasks become exploratory or creative, the same supervisor often becomes the cause of failure. Latency increases. The context windows fill up. The system begins to over-consider simple decisions because everything has to go through the same cognitive hurdle.

blackboard style architecture

In creative settings, a blackboard style architecture Often works better with shared memory. Instead of passing each idea through one manager, multiple experts provide partial solutions in a shared workspace. Other agents critique, refine, or build on those contributions. The system is improved by accumulation rather than by order. This reflects how real creative teams work: ideas are externalized, challenged, and iterated collectively.

peer to peer collaboration

In peer to peer collaboration, Agents exchange information directly without any central controller. This can work well for dynamic tasks like web navigation, exploration, or multistep discovery, where the goal is to cover ground rather than gather data rapidly. Risk is flux. Without some form of aggregation or validation, the system may become fragmented or loopy. In practice, this peer-to-peer style often appears as a herd.

swarm architecture

cluster Work well in tasks like web research because the goal is coverage, not instant convergence. Multiple agents locate sources in parallel, follow different leads, and pursue surface conclusions independently. Redundancy is not a problem here; This is a specialty. Overlap helps validate signals, while divergence helps avoid blind spots. Clusters are also effective in creative writing. One agent proposes narrative directions, another experiments with tone, a third rewrites the structure, and a fourth critiques clarity. Ideas collide, merge and evolve. The system behaves less like a pipeline and more like a writers’ room.

The main risk with herds is that they generate volume faster than decisions can be made, which can lead to even worse results. burning tokens in production. Consider strict exit conditions to prevent rising costs. Furthermore, without a subsequent aggregation step, swarms can drift, loop, or dominate downstream components. This is why they work best when paired with a solid consolidation phase and not as a standalone pattern.

Considering all this, many production systems benefit from the hybrid pattern. A small number of fast experts work in parallel, while a slower, more thoughtful agent periodically collects results, checks assumptions, and decides whether the system should continue or shut down. It balances throughput with stability and keeps errors from compounding unchecked. This is why I teach the agent-as-a-team mentality everywhere I go. AI Agent: The Definitive GuideBecause most production failures are coordination problems long before they are model problems.

If you think more deeply about this team analogy, you quickly realize that creative teams are not run like research laboratories. They don’t send every idea through the same manager. They iterate, discuss, criticize and unite. Research laboratories, on the other hand, do not function like creative studios. They prioritize reproducibility, controlled assumptions, and tightly scoped analysis. They benefit from structure, not free-form brainstorming cycles. That’s why it’s no surprise if your system fails; If you apply a default agent topology to every problem, the system cannot function at its full potential. Most failures caused by “bad signals” are actually mismatches between tasks, coordination patterns, information flows, and model architecture.

Want radar delivered straight to your inbox? Join us on Substack. Sign up here.

Breaking the Loop: “Hiring” Your Agents the Right Way.

I design AI agents the same way I think about building a team. Each agent has a skills profile, strengths, blind spots, and a suitable role. The system only works when these skills are combined rather than interfering. A strong model placed in the wrong role behaves like a highly skilled employee hired for the wrong job. It doesn’t just perform poorly, it actively creates friction. In my mental models, I classify models based on architectural personality. The following is a high level overview.

Decoder-only (generator and planner): These are your standard LLM like GPT or Cloud. They are your talkers and coders, strong at drafting and step-by-step planning. Use them to execute: writing, coding, and generating candidate solutions.

Encoder-only (analysts and investigators): Models like BERT and its modern representations such as ModernBERT and NeoBERT do not talk; they understand. They create relevant embeddings and are excellent at semantic search, filtering, and relevance scoring. Use them to rank, verify, and limit your search space even before your expensive generators are activated.

Mix of experts (experts): MoE models behave like a set of internal expert departments, where a router activates only a subset of experts per token. Use them when you need high capacity but want to spend computation selectively.

Reasoning Model (Thinker): These are optimized models to spend more computation at testing time. They stop, reflect and check their logic. They are slow, but they often prevent costly downstream mistakes.

So if you find yourself writing a 2,000-word prompt to get a prompt generator to act like a thinker, you’ve made the wrong appointment. You don’t need a better signal; You need a different architecture and better system-level scaling.

Designing Digital Organizations: The Science of Scaling Agent Systems

neural scaling¹Is consistent and works well for models. As shown by classic scaling laws, increasing parameter count, data, and computation leads to predictable improvements in capacity. This logic applies for single models. collaborative scaling,² What you need in an agentic system is different. This is conditional. It grows, stabilizes, and sometimes collapses depending on communication costs, memory constraints, and how much context each agent actually sees. Adding agents does not behave like adding parameters.

This is why topology matters. Chains, trees, and other coordination structures behave very differently under load. As the system grows, some topologies make the logic stable. Others increase noise, latency, and error. These observations align with early work on collaborative scaling in multi-agent systems, which shows that performance does not increase monotonically with agent count.

Recent work by Google Research and Google DeepMind³ Makes this distinction clear. The difference between a system that improves with each loop and a system that falls apart is not the number of agents or the size of the model. This is how the system is wired. As the number of agents grows, so does the coordination tax: communication overhead increases, latency increases, and context windows blow up. Furthermore, when too many entities attempt to solve the same problem without a clear structure, the system begins to interfere with itself. The coordination structure, flow of information, and decision-making topology determine whether a system maximizes efficiency or increases error.

System-level takeaways

If your multi-agent system is failing, thinking like a model practitioner is no longer enough. Stop reaching signal. The growth in agentic research has made one truth undeniable: the field is moving from rapid engineering to organizational systems. Next time you design your agentic system, ask yourself:

How do I organize the team? (pattern)
Who should I put in those slots? (Renting/Architecture)
Why might it fail en masse? (scaling law)

That said, the winners in the agentic age will not be those with the smartest instructions, but those who create the most flexible collaboration structures. Agent performance is an architectural result, not a motivational problem.

Reference

Jared Kaplan et al., “Scaling Laws for Neural Language Models,” (2020): https://arxiv.org/abs/2001.08361.
Chen Qian et al., “Scaling Large Language Model-Based Multi-Agent Collaboration,” (2025): https://arxiv.org/abs/2406.07155.
Yubin Kim et al., “Towards a Science of Scaling Agent Systems,” (2025): https://arxiv.org/abs/2512.08296.

Designing Effective Multi-Agent Architectures – O’Reilly

Beyond the signaling fallacy: general cooperation patterns.

observer-based architecture

blackboard style architecture

peer to peer collaboration

swarm architecture

Breaking the Loop: “Hiring” Your Agents the Right Way.

Designing Digital Organizations: The Science of Scaling Agent Systems

System-level takeaways

Reference

Putting AI to Work, MIT Technology Review’s new AI newsletter, is here

EU threatens action over WhatsApp blocking rival AI chatbots to Meta AI (Artificial Intelligence)

Related Articles

Leave a Comment Cancel Reply