A Grok-Powered Agentic Research Assistant with LangGraph, Tool Calling, Sub-Agents, and Agentic Memory: Let's Build It

In this tutorial, we create a grok-Powered agentic research workflows that run directly using Grok’s free OpenAI-compliant inference endpoint. We configure Langchain’s ChatOpenAI interface to work with Grok by setting the Grok API key and base URL, allowing us to use fast hosted models like llama-3.3-70b-versatile for tool-based reasoning. We then connect the model to practical tools for web searching, webpage fetching, file handling, Python execution, skill loading, sub-agent delegation, and long-term memory. By the end of the tutorial, we have a working Grok-based multi-step agent that can research a topic, assign focused sub-tasks, generate structured output, and save useful information for subsequent runs.

import subprocess, sys
def _pip(*a): subprocess.check_call((sys.executable,"-m","pip","install","-q",*a))
_pip("langgraph>=0.2.50", "langchain>=0.3.0", "langchain-openai>=0.2.0",
    "langchain-community>=0.3.0", "ddgs", "requests", "beautifulsoup4",
    "tiktoken", "pydantic>=2.0")


import os, getpass
if not os.environ.get("GROQ_API_KEY"):
   os.environ("GROQ_API_KEY") = getpass.getpass("GROQ_API_KEY (free at console.groq.com/keys): ")


os.environ("OPENAI_API_KEY")  = os.environ("GROQ_API_KEY")
os.environ("OPENAI_BASE_URL") = "https://api.groq.com/openai/v1"


MODEL_NAME = "llama-3.3-70b-versatile"


import json, re, io, contextlib, pathlib
from typing import Annotated, TypedDict, Sequence, Literal, List, Dict, Any
from datetime import datetime, timezone
from langchain_openai import ChatOpenAI
from langchain_core.messages import (
   SystemMessage, HumanMessage, AIMessage, ToolMessage, BaseMessage)
from langchain_core.tools import tool
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode

We install the core libraries needed to build a Grok-powered agent workflow, including Langgraph, Langchain, DuckDuckGo search utilities, and supporting parsing libraries. We securely collect the Grok API key and configure Grok as an OpenAI-compliant endpoint by setting the API key and base URL. We then import all the necessary modules for messaging, tools, graph creation, typing, file system handling and model initialization.

SANDBOX = pathlib.Path("/content/deerflow_sandbox").resolve()
for sub in ("uploads","workspace","outputs","skills/public","skills/custom","memory"):
   (SANDBOX/sub).mkdir(parents=True, exist_ok=True)


def _safe(p: str) -> pathlib.Path:
   full = (SANDBOX/p.lstrip("/")).resolve()
   if not str(full).startswith(str(SANDBOX)):
       raise ValueError(f"path escapes sandbox: {p}")
   return full


SKILLS: Dict(str, Dict(str,str)) = {}
def register_skill(name, description, content, location="public"):
   d = SANDBOX/"skills"/location/name; d.mkdir(parents=True, exist_ok=True)
   (d/"SKILL.md").write_text(content)
   SKILLS(name) = {"description": description, "content": content,
                   "path": str(d/"SKILL.md")}


register_skill("research",
   "Conduct multi-source web research on a topic and produce structured notes.",
   """# Research Skill
## Workflow
1. Decompose the question into 3-5 sub-questions.
2. For each sub-question call `web_search` and pick 2 authoritative URLs.
3. `web_fetch` those URLs; extract concrete facts, numbers, dates.
4. Cross-reference for consensus vs. disagreement.
5. Append findings to `workspace/research_notes.md`: claim → evidence → URL.
## Best practices
- Prefer primary sources. Note dates. Never fabricate URLs or numbers.""")


register_skill("report-generation",
   "Synthesize research notes into a polished markdown report in outputs/.",
   """# Report Generation Skill
## Workflow
1. file_read('workspace/research_notes.md').
2. Outline: exec summary, key findings, analysis, conclusion, sources.
3. file_write('outputs/report.md', ...).
## Structure
- # Title
- ## Executive Summary  (3–5 sentences)
- ## Key Findings       (bullets)
- ## Detailed Analysis  (sections)
- ## Conclusion
- ## Sources            (numbered URL list)""")


register_skill("code-execution",
   "Run Python in the sandbox for computation, data wrangling, charts.",
   """# Code Execution Skill
1. Plan in plain language first.
2. python_exec the code; persistent artifacts go to /outputs/.
3. Verify before quoting results.""")


MEM = SANDBOX/"memory/long_term.json"
if not MEM.exists():
   MEM.write_text(json.dumps({"facts":(),"preferences":{}}, indent=2))
def _load_mem(): return json.loads(MEM.read_text())
def _save_mem(m): MEM.write_text(json.dumps(m, indent=2))

We create a sandboxed project directory in Colab to keep uploads, workspace files, outputs, skills, and memory organized in one controlled location. We define reusable skills for research, report generation, and code execution so agents can discover and follow structured workflows. We also initialize a simple long-term memory JSON file that stores facts and preferences across multiple runs within the same sandbox.

@tool
def list_skills() -> str:
   """List all skills with one-line descriptions. Call this first for complex tasks."""
   return "n".join(f"- {n}: {s('description')}" for n,s in SKILLS.items())


@tool
def load_skill(name: str) -> str:
   """Load full SKILL.md for `name`. Call before running its workflow."""
   if name not in SKILLS: return f"Unknown. Available: {list(SKILLS)}"
   return SKILLS(name)("content")


@tool
def web_search(query: str, max_results: int = 5) -> str:
   """Search the web (DuckDuckGo). Returns titles, URLs, snippets."""
   from ddgs import DDGS
   out = ()
   try:
       with DDGS() as d:
           for r in d.text(query, max_results=max_results):
               out.append(f"- {r.get('title','')}n  URL: {r.get('href','')}n  "
                          f"{(r.get('body') or '')(:220)}")
   except Exception as e:
       return f"search error: {e}"
   return "n".join(out) or "no results"


@tool
def web_fetch(url: str, max_chars: int = 4000) -> str:
   """Fetch a URL, return cleaned text (scripts/nav stripped)."""
   import requests
   from bs4 import BeautifulSoup
   try:
       r = requests.get(url, timeout=15,
                        headers={"User-Agent":"Mozilla/5.0 DeerFlow-Lite"})
       soup = BeautifulSoup(r.text, "html.parser")
       for s in soup(("script","style","nav","footer","aside","header")): s.decompose()
       text = re.sub(r"ns*n", "nn", soup.get_text("n")).strip()
       return text(:max_chars) or "(empty page)"
   except Exception as e:
       return f"fetch error: {e}"


@tool
def file_write(path: str, content: str) -> str:
   """Write content to a sandbox path, e.g. 'workspace/notes.md' or 'outputs/x.md'."""
   p = _safe(path); p.parent.mkdir(parents=True, exist_ok=True)
   p.write_text(content)
   return f"wrote {len(content)} chars → {path}"


@tool
def file_read(path: str) -> str:
   """Read a sandbox file (first 8 KB)."""
   p = _safe(path)
   return p.read_text()(:8000) if p.exists() else f"not found: {path}"


@tool
def file_list(path: str = "") -> str:
   """List files under a sandbox dir."""
   base = _safe(path) if path else SANDBOX
   if not base.exists(): return "not found"
   items = ()
   for c in sorted(base.rglob("*")):
       if "memory" in c.relative_to(SANDBOX).parts: continue
       items.append(f"  {'D' if c.is_dir() else 'F'}  {c.relative_to(SANDBOX)}")
   return "n".join(items(:60)) or "(empty)"


@tool
def python_exec(code: str) -> str:
   """Run Python in the sandbox. SANDBOX_ROOT is preset."""
   g = {"__name__":"__sb__", "SANDBOX_ROOT": str(SANDBOX)}
   buf = io.StringIO()
   try:
       with contextlib.redirect_stdout(buf), contextlib.redirect_stderr(buf):
           exec(code, g)
       return (buf.getvalue() or "(no stdout)")(:4000)
   except Exception as e:
       return f"{type(e).__name__}: {e}n{buf.getvalue()(:1500)}"


@tool
def remember(fact: str) -> str:
   """Persist a single fact to long-term memory (survives across runs)."""
   m = _load_mem()
   m("facts").append({"fact": fact, "ts": datetime.now(timezone.utc).isoformat()})
   _save_mem(m)
   return f"remembered ({len(m('facts'))} total)"


@tool
def recall() -> str:
   """Retrieve everything in long-term memory."""
   m = _load_mem()
   if not m("facts"): return "(memory empty)"
   return "n".join(f"- {f('fact')}" for f in m("facts")(-20:))

We define the main tools that a Grok-supported agent can call during execution, including listing skills, loading skill instructions, searching the web, fetching webpages, reading files, and writing files. We also provide the agent with a sandboxed Python execution environment so that it can run calculations or generate artifacts as needed. We add memory tools that allow the agent to remember important facts and recall previously stored information.

@tool
def spawn_subagent(role: str, task: str,
                  allowed_tools: str = "web_search,web_fetch,file_write,file_read") -> str:
   """Spawn an isolated sub-agent with a focused role and scoped tools.
   Returns its final report string. Use for parallelizable / focused subtasks."""
   bag = {t.name: t for t in BASE_TOOLS}
   sub_tools = (bag(n.strip()) for n in allowed_tools.split(",") if n.strip() in bag)
   sub_llm = ChatOpenAI(model=MODEL_NAME, temperature=0.2).bind_tools(sub_tools)
   sys_msg = SystemMessage(content=(
       f"You are a specialized sub-agent. Role: {role}.n"
       f"You operate in an ISOLATED context — no access to lead history.n"
       f"Tools: {', '.join(t.name for t in sub_tools)}.n"
       "End with a final assistant message starting 'FINAL REPORT:' "
       "containing a structured ≤700-word summary including any URLs."))
   msgs: List(BaseMessage) = (sys_msg, HumanMessage(content=task))
   for _ in range(8):
       r = sub_llm.invoke(msgs); msgs.append(r)
       if not getattr(r, "tool_calls", None):
           return f"(sub-agent: {role})n" + (r.content if isinstance(r.content,str) else str(r.content))
       for tc in r.tool_calls:
           t = bag.get(tc("name"))
           try:
               res = t.invoke(tc("args")) if t else f"unknown tool {tc('name')}"
           except Exception as e:
               res = f"tool error: {e}"
           msgs.append(ToolMessage(content=str(res)(:3000), tool_call_id=tc("id")))
   return f"(sub-agent: {role}) step-limit reached."


BASE_TOOLS = (list_skills, load_skill, web_search, web_fetch, file_write,
             file_read, file_list, python_exec, remember, recall)
ALL_TOOLS = BASE_TOOLS + (spawn_subagent)


LEAD_SYSTEM = f"""You are DeerFlow-Lite, a long-horizon super-agent harness.


Sandbox layout (relative to {SANDBOX}):
 uploads/    – user files
 workspace/  – your scratchpad
 outputs/    – final deliverables
 skills/     – capability modules (load_skill)


Principles:
 • For non-trivial tasks: list_skills → load_skill → execute.
 • Use spawn_subagent for focused subtasks (isolated context keeps lead lean).
 • Persist intermediates to workspace/, deliverables to outputs/.
 • Use remember(fact) for cross-session knowledge.
 • Finish with a short summary of what was produced and where.


Today: {datetime.now(timezone.utc).strftime('%Y-%m-%d')}."""


class AgentState(TypedDict):
   messages: Annotated(Sequence(BaseMessage), add_messages)


llm = ChatOpenAI(model=MODEL_NAME, temperature=0.3).bind_tools(ALL_TOOLS)


def call_model(state: AgentState):
   msgs = list(state("messages"))
   if not msgs or not isinstance(msgs(0), SystemMessage):
       msgs = (SystemMessage(content=LEAD_SYSTEM)) + msgs
   return {"messages": (llm.invoke(msgs))}


def route(state: AgentState) -> Literal("tools","__end__"):
   last = state("messages")(-1)
   return "tools" if getattr(last, "tool_calls", None) else END


g = StateGraph(AgentState)
g.add_node("agent", call_model)
g.add_node("tools", ToolNode(ALL_TOOLS))
g.set_entry_point("agent")
g.add_conditional_edges("agent", route, {"tools":"tools", END: END})
g.add_edge("tools", "agent")
APP = g.compile()

We create a sub-agent tool that allows the main Grooq-powered agent to delegate focused tasks to a separate assistant with a limited set of tools. Then we collect all the available tools, define the lead system prompt, initialize the Grok-supported chat model, and connect the tools to it. We eventually built the LangGraph workflow so that the agent could alternate between logic and tool execution until reaching a final answer.

def run(task: str, max_steps: int = 25):
   print("="*78); print(f"🦌 TASK: {task}"); print("="*78)
   state = {"messages":(HumanMessage(content=task))}
   n = 0
   for ev in APP.stream(state, {"recursion_limit": max_steps*2}, stream_mode="updates"):
       for node, payload in ev.items():
           for m in payload.get("messages", ()):
               n += 1
               if isinstance(m, AIMessage):
                   if m.tool_calls:
                       for tc in m.tool_calls:
                           args = json.dumps(tc("args"), ensure_ascii=False)
                           args = args(:140) + ("…" if len(args)>140 else "")
                           print(f"({n:02}) 🔧 {tc('name')}({args})")
                   else:
                       txt = m.content if isinstance(m.content,str) else str(m.content)
                       print(f"({n:02}) 🦌 {txt(:800)}")
               elif isinstance(m, ToolMessage):
                   s = str(m.content).replace("n"," ")(:220)
                   print(f"({n:02}) 📤 {s}")
   print("n"+"="*78); print("✅ COMPLETE — sandbox state:"); print("="*78)
   print(file_list.invoke({"path":""}))
   print("n🧠 Long-term memory:"); print(recall.invoke({}))
   for f in sorted((SANDBOX/"outputs").rglob("*")):
       if f.is_file():
           print(f"n--- 📄 {f.relative_to(SANDBOX)} (first 800 chars) ---")
           print(f.read_text()(:800))


run(
   "Give me a briefing on small language models (SLMs) in 2025. "
   "(1) discover skills; (2) spawn a researcher sub-agent to gather "
   "specifics on three notable SLMs from 2024-2025 with sizes, benchmarks, "
   "and use cases — sub-agent saves to workspace/slm_research.md; "
   "(3) load report-generation skill and write outputs/slm_briefing.md "
   "(~400 words) with a Sources section; (4) save the single most "
   "important takeaway to long-term memory; (5) summarize.",
   max_steps=25,
)

We define a run() function that starts the user task, streams each agent step, and prints tool calls, tool outputs, and final responses in a readable format. We also display the sandbox file structure, long-term memory, and generated output files after the workflow is completed. We finish by running a demo task in which a Grok-powered agent researches small language models, prepares a briefing, saves a report, and stores a key takeaway in memory.

Finally, we created a compact but capable Grok-based agent framework that demonstrates how Grok’s OpenAI-compliant API can serve as a fast, accessible backend for advanced LLM workflows. We used Langgraph to manage agent loops, Langchain to bind tools to the Grok-hosted model, and custom Python utilities to provide the system with controlled access to search, files, code execution, and memory. We also demonstrated how isolated sub-agents can help handle focused research tasks while the main agent coordinates the overall workflow. Furthermore, we have designed a practical Grok-powered agentic system that can be expanded into research assistants, automated briefing generators, and multi-step AI applications.

check it out Full code with notebook here. Also, feel free to follow us Twitter And don’t forget to join us 130k+ ML subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

Do you need to partner with us to promote your GitHub repo or Hugging Face page or product release or webinar, etc? join us

A Grok-Powered Agentic Research Assistant with LangGraph, Tool Calling, Sub-Agents, and Agentic Memory: Let’s Build It

AutoML on Autopilot | towards AI

Stop Wasting Tokens: A Better Alternative to JSON for LLM Pipelines

Related Articles

Leave a Comment Cancel Reply