In this tutorial, we’ll explore how we can organize a team of specialized AI agents locally using an intelligent manager-agent architecture powered by TinyLlama. We walk through how to build structured task decomposition, inter-agent collaboration, and autonomous logic loops without relying on any external APIs. By running everything directly through the Transformer library, we create a completely offline, lightweight, and transparent multi-agent system that we can customize, inspect, and extend. Through snippets, we see how each component, from task structures to agent signals to outcome synthesis, come together to create a coherent human-AI workflow that we control from start to finish. check it out full code here,
!pip install transformers torch accelerate bitsandbytes -q
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import json
import re
from typing import List, Dict, Any
from dataclasses import dataclass, asdict
from datetime import datetime
@dataclass
class Task:
id: str
description: str
assigned_to: str = None
status: str = "pending"
result: Any = None
dependencies: List(str) = None
def __post_init__(self):
if self.dependencies is None:
self.dependencies = ()
@dataclass
class Agent:
name: str
role: str
expertise: str
system_prompt: str
We set up all the key imports and define the basic data structures needed to manage tasks and agents. We define tasks and agents as structured units to organize tasks systematically. By doing this, we ensure that every part of the system has a consistent and reliable foundation. check it out full code here,
AGENT_REGISTRY = {
"researcher": Agent(
name="researcher",
role="Research Specialist",
expertise="Information gathering, analysis, and synthesis",
system_prompt="You are a research specialist. Provide thorough research on topics."
),
"coder": Agent(
name="coder",
role="Software Engineer",
expertise="Writing clean, efficient code with best practices",
system_prompt="You are an expert programmer. Write clean, well-documented code."
),
"writer": Agent(
name="writer",
role="Content Writer",
expertise="Clear communication and documentation",
system_prompt="You are a professional writer. Create clear, engaging content."
),
"analyst": Agent(
name="analyst",
role="Data Analyst",
expertise="Data interpretation and insights",
system_prompt="You are a data analyst. Provide clear insights from data."
)
}
class LocalLLM:
def __init__(self, model_name: str = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
) if torch.cuda.is_available() else None
self.model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map="auto",
low_cpu_mem_usage=True
)
if self.tokenizer.pad_token is None:
self.tokenizer.pad_token = self.tokenizer.eos_token
def generate(self, prompt: str, max_tokens: int = 300) -> str:
formatted_prompt = f"<|system|>nYou are a helpful AI assistant.n<|user|>n{prompt}n<|assistant|>n"
inputs = self.tokenizer(
formatted_prompt,
return_tensors="pt",
truncation=True,
max_length=1024,
padding=True
)
inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = self.model.generate(
**inputs,
max_new_tokens=max_tokens,
temperature=0.7,
do_sample=True,
top_p=0.9,
pad_token_id=self.tokenizer.pad_token_id,
eos_token_id=self.tokenizer.eos_token_id,
use_cache=True
)
full_response = self.tokenizer.decode(outputs(0), skip_special_tokens=True)
if "<|assistant|>" in full_response:
return full_response.split("<|assistant|>")(-1).strip()
return full_response(len(formatted_prompt):).strip()
We register all of our specific agents and implement the local LLM wrapper that powers the system. We load TinyLlama in an efficient 4-bit mode so we can easily run everything in Colab or on local hardware. With this, we provide ourselves with a flexible and completely local way to generate responses for each agent. check it out full code here,
class ManagerAgent:
def __init__(self, model_name: str = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"):
self.llm = LocalLLM(model_name)
self.agents = AGENT_REGISTRY
self.tasks: Dict(str, Task) = {}
self.execution_log = ()
def log(self, message: str):
timestamp = datetime.now().strftime("%H:%M:%S")
log_entry = f"({timestamp}) {message}"
self.execution_log.append(log_entry)
print(log_entry)
def decompose_goal(self, goal: str) -> List(Task):
self.log(f"🎯 Decomposing goal: {goal}")
agent_info = "n".join((f"- {name}: {agent.expertise}" for name, agent in self.agents.items()))
prompt = f"""Break down this goal into 3 specific subtasks. Assign each to the best agent.
Goal: {goal}
Available agents:
{agent_info}
Respond ONLY with a JSON array."""
response = self.llm.generate(prompt, max_tokens=250)
try:
json_match = re.search(r'(s*{.*?}s*)', response, re.DOTALL)
if json_match:
tasks_data = json.loads(json_match.group())
else:
raise ValueError("No JSON found")
except:
tasks_data = self._create_default_tasks(goal)
tasks = ()
for i, task_data in enumerate(tasks_data(:3)):
task = Task(
id=task_data.get('id', f'task_{i+1}'),
description=task_data.get('description', f'Work on: {goal}'),
assigned_to=task_data.get('assigned_to', list(self.agents.keys())(i % len(self.agents))),
dependencies=task_data.get('dependencies', () if i == 0 else (f'task_{i}'))
)
self.tasks(task.id) = task
tasks.append(task)
self.log(f" ✓ {task.id}: {task.description(:50)}... → {task.assigned_to}")
return tasks
We start by building the ManagerAgent class and focus on how we decompose a high-level goal into well-defined subtasks. We generate structured JSON-based tasks and automatically assign them to the right agent. By doing this, we allow the system to think and organize the work step by step, just like a human project manager. check it out full code here,
def _create_default_tasks(self, goal: str) -> List(Dict):
if any(word in goal.lower() for word in ('code', 'program', 'implement', 'algorithm')):
return (
{"id": "task_1", "description": f"Research and explain the concept: {goal}", "assigned_to": "researcher", "dependencies": ()},
{"id": "task_2", "description": f"Write code implementation for: {goal}", "assigned_to": "coder", "dependencies": ("task_1")},
{"id": "task_3", "description": f"Create documentation and examples", "assigned_to": "writer", "dependencies": ("task_2")}
)
return (
{"id": "task_1", "description": f"Research: {goal}", "assigned_to": "researcher", "dependencies": ()},
{"id": "task_2", "description": f"Analyze findings and structure content", "assigned_to": "analyst", "dependencies": ("task_1")},
{"id": "task_3", "description": f"Write comprehensive response", "assigned_to": "writer", "dependencies": ("task_2")}
)
def execute_task(self, task: Task, context: Dict(str, Any) = None) -> str:
self.log(f"🤖 Executing {task.id} with {task.assigned_to}")
task.status = "in_progress"
agent = self.agents(task.assigned_to)
context_str = ""
if context and task.dependencies:
context_str = "nnContext from previous tasks:n"
for dep_id in task.dependencies:
if dep_id in context:
context_str += f"- {context(dep_id)(:150)}...n"
prompt = f"""{agent.system_prompt}
Task: {task.description}{context_str}
Provide a clear, concise response:"""
result = self.llm.generate(prompt, max_tokens=250)
task.result = result
task.status = "completed"
self.log(f" ✓ Completed {task.id}")
return result
We define the fallback task logic and complete execution flow for each task. We guide each agent with its own system prompts and provide relevant information to keep results consistent. This allows us to execute tasks intelligently while respecting the dependency order. check it out full code here,
def synthesize_results(self, goal: str, results: Dict(str, str)) -> str:
self.log("🔄 Synthesizing final results")
results_text = "nn".join((f"Task {tid}:n{res(:200)}" for tid, res in results.items()))
prompt = f"""Combine these task results into one final coherent answer.
Original Goal: {goal}
Task Results:
{results_text}
Final comprehensive answer:"""
return self.llm.generate(prompt, max_tokens=350)
def execute_goal(self, goal: str) -> Dict(str, Any):
self.log(f"n{'='*60}n🎬 Starting Manager Agentn{'='*60}")
tasks = self.decompose_goal(goal)
results = {}
completed = set()
max_iterations = len(tasks) * 2
iteration = 0
while len(completed) < len(tasks) and iteration < max_iterations:
iteration += 1
for task in tasks:
if task.id in completed:
continue
deps_met = all(dep in completed for dep in task.dependencies)
if deps_met:
result = self.execute_task(task, results)
results(task.id) = result
completed.add(task.id)
final_output = self.synthesize_results(goal, results)
self.log(f"n{'='*60}n✅ Execution Complete!n{'='*60}n")
return {
"goal": goal,
"tasks": (asdict(task) for task in tasks),
"final_output": final_output,
"execution_log": self.execution_log
}
We synthesize the outputs from all the subtasks and convert them into an integrated final answer. We also implement an orchestration loop that ensures that each task runs only after its dependencies are satisfied. This snippet shows how we bring everything together into a seamless multi-step logic pipeline. check it out full code here,
def demo_basic():
manager = ManagerAgent()
goal = "Explain binary search algorithm with a simple example"
result = manager.execute_goal(goal)
print("n" + "="*60)
print("FINAL OUTPUT")
print("="*60)
print(result("final_output"))
return result
def demo_coding():
manager = ManagerAgent()
goal = "Implement a function to find the maximum element in a list"
result = manager.execute_goal(goal)
print("n" + "="*60)
print("FINAL OUTPUT")
print("="*60)
print(result("final_output"))
return result
def demo_custom(custom_goal: str):
manager = ManagerAgent()
result = manager.execute_goal(custom_goal)
print("n" + "="*60)
print("FINAL OUTPUT")
print("="*60)
print(result("final_output"))
return result
if __name__ == "__main__":
print("🤖 Manager Agent Tutorial - APIless Local Version")
print("="*60)
print("Using TinyLlama (1.1B) - Fast & efficient!n")
result = demo_basic()
print("nn💡 Try more:")
print(" - demo_coding()")
print(" - demo_custom('your goal here')")
We provide demonstration tasks to easily test your system with different goals. We run sample tasks to see how the manager decomposes, executes, and synthesizes tasks in real time. This provides us with an interactive way to understand the entire workflow and further refine it.
Finally, we demonstrate how to design and operate an entire multi-agent orchestration system locally with minimal dependencies. We now understand how a manager breaks down goals, routes tasks to the right expert agents, collects their outputs, resolves dependencies, and synthesizes the final result. This implementation allows us to understand how modular, predictable, and powerful local agent patterns can be when built from scratch.
check it out full code hereFeel free to check us out GitHub page for tutorials, code, and notebooksAlso, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletterwait! Are you on Telegram? Now you can also connect with us on Telegram.
Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.
