How to Build a Self-Assessing Agent AI System with LlamaIndex and OpenAI Using Retrieval, Tool Usage, and Automated Quality Check

In this tutorial, we build an advanced agentic AI workflow using LlamaIndex and OpenAI models. We focus on designing a reliable retrieval-augmented generation (RAG) agent that can reason on evidence, use deliberate tools, and evaluate its own outputs for quality. By structuring the system around retrieval, answer synthesis, and self-assessment, we demonstrate how agentic patterns go beyond simple chatbots and move toward more trustworthy, controllable AI systems suitable for research and analytical use cases.

!pip -q install -U llama-index llama-index-llms-openai llama-index-embeddings-openai nest_asyncio


import os
import asyncio
import nest_asyncio
nest_asyncio.apply()


from getpass import getpass


if not os.environ.get("OPENAI_API_KEY"):
   os.environ("OPENAI_API_KEY") = getpass("Enter OPENAI_API_KEY: ")

We set up the environment and install all the necessary dependencies to run the agentic AI workflow. We securely load OpenAI API keys at runtime, ensuring that credentials are never hardcoded. We also prepare notebooks to handle asynchronous execution smoothly.

from llama_index.core import Document, VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding


Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0.2)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")


texts = (
   "Reliable RAG systems separate retrieval, synthesis, and verification. Common failures include hallucination and shallow retrieval.",
   "RAG evaluation focuses on faithfulness, answer relevancy, and retrieval quality.",
   "Tool-using agents require constrained tools, validation, and self-review loops.",
   "A robust workflow follows retrieve, answer, evaluate, and revise steps."
)


docs = (Document(text=t) for t in texts)
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=4)

We configure the OpenAI language model and embedding model and create a compact knowledge base for our agent. We transform raw text into indexed documents so that the agent can retrieve relevant evidence during reasoning.

from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator


faith_eval = FaithfulnessEvaluator(llm=Settings.llm)
rel_eval = RelevancyEvaluator(llm=Settings.llm)


def retrieve_evidence(q: str) -> str:
   r = query_engine.query(q)
   out = ()
   for i, n in enumerate(r.source_nodes or ()):
       out.append(f"({i+1}) {n.node.get_content()(:300)}")
   return "n".join(out)


def score_answer(q: str, a: str) -> str:
   r = query_engine.query(q)
   ctx = (n.node.get_content() for n in r.source_nodes or ())
   f = faith_eval.evaluate(query=q, response=a, contexts=ctx)
   r = rel_eval.evaluate(query=q, response=a, contexts=ctx)
   return f"Faithfulness: {f.score}nRelevancy: {r.score}"

We define the main tools used by the agent: evidence retrieval and answer evaluation. We apply automated scoring for credibility and relevance so agents can assess the quality of their responses.

from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context


agent = ReActAgent(
   tools=(retrieve_evidence, score_answer),
   llm=Settings.llm,
   system_prompt="""
Always retrieve evidence first.
Produce a structured answer.
Evaluate the answer and revise once if scores are low.
""",
   verbose=True
)


ctx = Context(agent)

We create a React-based agent and define its system behavior, guiding how it receives evidence, generates answers, and modifies results. We also initialize the execution context that maintains the state of the agent during the interaction. This step brings together tools and logic into a single agentic workflow.

async def run_brief(topic: str):
   q = f"Design a reliable RAG + tool-using agent workflow and how to evaluate it. Topic: {topic}"
   handler = agent.run(q, ctx=ctx)
   async for ev in handler.stream_events():
       print(getattr(ev, "delta", ""), end="")
   res = await handler
   return str(res)


topic = "RAG agent reliability and evaluation"
loop = asyncio.get_event_loop()
result = loop.run_until_complete(run_brief(topic))


print("nnFINAL OUTPUTn")
print(result)

We execute the full agent loop by passing a subject into the system and streaming the agent’s arguments and output. We allow the agent to complete its retrieval, creation, and evaluation cycles asynchronously.

Finally, we showed how an agent can obtain supporting evidence, generate a structured response, and assess its own credibility and relevance before finalizing an answer. We kept the design modular and transparent, making it easy to extend the workflow with additional tools, evaluators, or domain-specific knowledge sources. This approach demonstrates how we can use agentic AI with LlamaIndex and OpenAI models to create more capable systems that are more reliable and self-aware in their reasoning and responses.

check it out full code here. Also, feel free to follow us Twitter And don’t forget to join us 100k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.

Asif Razzaq Marktechpost Media Inc. Is the CEO of. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. Their most recent endeavor is the launch of MarketTechPost, an Artificial Intelligence media platform, known for its in-depth coverage of Machine Learning and Deep Learning news that is technically robust and easily understood by a wide audience. The platform boasts of over 2 million monthly views, which shows its popularity among the audience.

How to Build a Self-Assessing Agent AI System with LlamaIndex and OpenAI Using Retrieval, Tool Usage, and Automated Quality Check

Scientists are now studying AI as a novel biological organism

If you use AI chatbots to monitor news, you’re basically injecting serious poison straight into your brain

Related Articles

Leave a Comment Cancel Reply