LAI #121: The single-agent sweet spot that no one wants to admit

Author(s): To the AI Editorial Team

Originally published on Towards AI.

Good morning, AI enthusiasts!

Your next AI system is probably very complex, and you haven’t even built it yet. This week, we co-published an article with Paul Isztein that gives you a mental model to catch overengineering before it starts. Here’s what’s inside:

Agent or Workflow? Getting this wrong is where most production headaches begin.
As agents become more autonomous, does bias increase? What exactly changes and how to control it at the system level.
Cloud Code’s three most neglected slash commands: /btw, /fork, and /rewind, and why they matter more the longer your session lasts.
The community voted on where the coding agents were headed. Terminal-based devices are moving forward, but that 17% “other” bucket is hiding something.
four must reads Covering Google’s A2A protocol, when SFT vs DPO vs RLHF vs RAG is actually implemented, a time series model that ultimately listens, and building a full clinic chatbot.

We’re also launching a new section this week, AI Tip of the Day, where I share practical tips and takeaways from our courses that you can apply to your own projects, read to understand where the industry is headed, and learn which tools to focus on. This week, we’re starting it off with RAG pipelines (if you’ve been here long enough, you know how much we love RAG) and its two failure modes that most of you don’t evaluate separately.

Let’s join in!

What is AI?

This week, in What’s AI, I’m looking at controlling biases in AI agents. Many believe that bias will increase as agents become more autonomous. So today, I will clarify this notion by explaining what bias actually means in the context of LLM, why bias is not inherently bad, and what fundamentally changes when we move from a simple language model to an autonomous agent. We will also learn how to realistically control bias as a measure of autonomy, not only at the model level, but at the system level. Read the full article here Or watch video on youtube.

AI tip of the day

To ensure that your RAG recovery is working correctly, split your assessment into two layers. For retrieval, measure whether relevant evidence was retrieved using metrics such as Recall@K and Mean Reciprocal Rank. To generate,measure the reliability and relevance of the question’s,answer to the retrieved context, often using an LLM judge,calibrated according to human labels.

High retrieval recall with low reliability suggests that the model had the right evidence, but failed to use it properly. High reliability with low retrieval recall suggests that the model remained based in the retrieved context, but retrieval revealed incomplete or off-target evidence. These are two completely different problems with two completely different problems, and without segmentation, you can’t tell which one you’re dealing with.

If you’re currently building a RAG pipeline and want to go deeper into assessment, recovery strategies, and the full production stack, check out our Full Stack AI Engineering course.

-Louis-François Bouchard, co-founder and head of community at Towards AI

We’ve co-published an article with Paul Isztein covering the mental models that keep you from overengineering your next AI system.

Here’s what you’ll learn:

The fundamental difference between agents and workflows.
How to use the complexity spectrum to make architectural decisions.
When to rely on simple workflows for predictable tasks.
Why is a single agent with tools often sufficient for dynamic problems?
The exact breaking points that justify moving to multi-agent systems.

Read the full article here!

Learn AI together in the Community section!

Featured community posts from Discord

ecocerida has built an AI chat platform with RAG and real-time token streaming. The system provides real-time, token-by-token AI responses using a fully decoupled microservices architecture. It is built with .NET 10 microservices using event sourcing, CQRS, Wolverine Sagas, Marten, RabbitMQ, SignalR, Keycloak, and Kong, with an Angular 21 frontend powered by NGRX SignalStore. Check it out on GitHub And support a fellow community member. If you have any ideas on token streaming pipeline or LLM provider abstraction, share them in the thread!

AI Survey of the Week!

Most of you right now are leaning towards terminal-style coding agents (codex/cloud code), with IDE-based tools (cursors, etc.) coming in second, and a smaller set either sticking to chat, running custom stacks, or testing new agent products like OpenClave/Cloud Cowork. What’s interesting is not who is “winning”, but rather that the center of gravity is clearly shifting from asking for code to committing changes to the repo, which is exactly where terminals and repo-aware agents feel natural. Furthermore, that “other” bucket is so large that it’s probably hiding many specific-but-real workflows that aren’t captured by the options. share something in the thread!

opportunities for collaboration

The Learn AI Together Discord community is bursting with opportunities for collaboration. If you’re excited to get into applied AI, want a study partner, or even want to find a partner for your passion project, Join the collaboration channel! Keep an eye on this section too – we share great deals every week!

1. kamlesh_22497 Looking for people to learn and create through study groups, project collaborations and discussions. If you are also on a similar path, join the thread with him!

2. mirgot Looking for someone who wants to build something meaningful (and profitable). They are trying to combine practical business thinking with AI skills and need someone with a business mindset and an AI background. If this sounds like you, reach them in the thread!

3. Majestic_728 is looking for an entry-level ML/DS study partner to study for an hour every day. if you are interested, contact him in the thread!

Meme of the week!

Meme shared rucha8062

TAI Curated Section

article of the week

Mastering the /btw, /fork, and /rewind of cloud code: the context hygiene toolkit rick hightower

Context pollution corrupts an AI coding session by filling the context window with unrelated Q&As. This article includes three Cloud Code commands that address this: /btw spawns a temporary agent to answer mid-task queries without affecting the main session context; /fork creates a parallel session that captures the full conversation history for safe exploration; And /Rewind returns corrupted code or conversation to a clean checkpoint. Together, they form a toolkit for maintaining a high signal-to-noise ratio over long sessions.

Our must-read articles

1. Google’s A2A protocol using Langgraph: Build agent systems that actually communicate Divya Yadav

Google’s Agent2Agent (A2A) protocol targets a persistent gap in enterprise AI: Agents from different vendors can’t coordinate without custom glue code. This piece explains how A2A uses agent cards for discovery, structured task lifecycles, and HTTP-based messaging to enable cross-vendor agent collaboration across organizational boundaries. It compares A2A with MCP, clarifying that they resolve to different layers, and cover production failure modes including timeout handling, context mismatch, and authentication drift.

2. What do SFT, DPO, RLHF and RAG actually do in an AI agent Shenggang Li

A functioning AI agent needs more than fluent answers. This article connects each training technique to a customer support scenario. It shows exactly when each is applied: SFT to get tone and task format right through demonstrations, RAG to inject business facts at inference time without touching model weights, DPO to choose between two valid answers when one feels better, and RLHF to target the full decision path when the problem goes deeper than any one answer.

3. Your one-stop reference for PatchTest, because it’s the only time series model that listens! By Dr. Swarnendu A.I

The fundamental flaw in Transformer-based time series models is not the architecture but the tokenism. Instead of encoding a single timestamp per token, the paper from Princeton and IBM Research cut each series into overlapping patches, each representing a semantic time window. With channel independence and reversible instance normalization, the approach reduced MSE by 21% compared to the prior Transformer on standard benchmarks. The article includes all the math, the PyTorch implementation, and an honest map of when XGBoost wins.

4. Agentic AI Project: Build a Customer Service Chatbot for a Clinic alpha iterations

The article walks you through building a clinic appointment chatbot with end-to-end implementation using Langgraph, GPT-4o-mini, SQLite and Streamlit. The system manages multi-step conversation workflow across all nodes, handling special selection, doctor display, time slot creation and booking confirmation. An SQLite service layer abstracts database operations into reusable agent tools, while a memoryserver checkpoint maintains session state at each turn. Ready Chatbot reduces administrative overhead and prevents scheduling conflicts through a clean, guided conversational interface that can be deployed as both a web app and Jupyter notebook.

If you’re interested in publishing with Towards AI, check out our guidelines and sign up. If your work meets our editorial policies and standards we will publish your work on our network.

Published via Towards AI