{"id":283,"date":"2026-01-18T05:56:49","date_gmt":"2026-01-17T21:56:49","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=283"},"modified":"2026-01-18T05:56:49","modified_gmt":"2026-01-17T21:56:49","slug":"how-to-build-a-self-evaluating-agentic-ai-system-with-llamaindex-and-openai-using-retrieval-tool-use-and-automated-quality-checks","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=283","title":{"rendered":"How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks"},"content":{"rendered":"<p>In this tutorial, we build an advanced agentic AI workflow using LlamaIndex and OpenAI models. We focus on designing a reliable retrieval-augmented generation (RAG) agent that can reason over evidence, use tools deliberately, and evaluate its own outputs for quality. By structuring the system around retrieval, answer synthesis, and self-evaluation, we demonstrate how agentic patterns go beyond simple chatbots and move toward more trustworthy, controllable AI systems suitable for research and analytical use cases.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">!pip -q install -U llama-index llama-index-llms-openai llama-index-embeddings-openai nest_asyncio\n\n\nimport os\nimport asyncio\nimport nest_asyncio\nnest_asyncio.apply()\n\n\nfrom getpass import getpass\n\n\nif not os.environ.get(\"OPENAI_API_KEY\"):\n   os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OPENAI_API_KEY: \")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We set up the environment and install all required dependencies for running an agentic AI workflow. We securely load the OpenAI API key at runtime, ensuring that credentials are never hardcoded. We also prepare the notebook to handle asynchronous execution smoothly.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">from llama_index.core import Document, VectorStoreIndex, Settings\nfrom llama_index.llms.openai import OpenAI\nfrom llama_index.embeddings.openai import OpenAIEmbedding\n\n\nSettings.llm = OpenAI(model=\"gpt-4o-mini\", temperature=0.2)\nSettings.embed_model = OpenAIEmbedding(model=\"text-embedding-3-small\")\n\n\ntexts = [\n   \"Reliable RAG systems separate retrieval, synthesis, and verification. Common failures include hallucination and shallow retrieval.\",\n   \"RAG evaluation focuses on faithfulness, answer relevancy, and retrieval quality.\",\n   \"Tool-using agents require constrained tools, validation, and self-review loops.\",\n   \"A robust workflow follows retrieve, answer, evaluate, and revise steps.\"\n]\n\n\ndocs = [Document(text=t) for t in texts]\nindex = VectorStoreIndex.from_documents(docs)\nquery_engine = index.as_query_engine(similarity_top_k=4)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We configure the OpenAI language model and embedding model and build a compact knowledge base for our agent. We transform raw text into indexed documents so that the agent can retrieve relevant evidence during reasoning.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">from llama_index.core.evaluation import FaithfulnessEvaluator, RelevancyEvaluator\n\n\nfaith_eval = FaithfulnessEvaluator(llm=Settings.llm)\nrel_eval = RelevancyEvaluator(llm=Settings.llm)\n\n\ndef retrieve_evidence(q: str) -&gt; str:\n   r = query_engine.query(q)\n   out = []\n   for i, n in enumerate(r.source_nodes or []):\n       out.append(f\"[{i+1}] {n.node.get_content()[:300]}\")\n   return \"n\".join(out)\n\n\ndef score_answer(q: str, a: str) -&gt; str:\n   r = query_engine.query(q)\n   ctx = [n.node.get_content() for n in r.source_nodes or []]\n   f = faith_eval.evaluate(query=q, response=a, contexts=ctx)\n   r = rel_eval.evaluate(query=q, response=a, contexts=ctx)\n   return f\"Faithfulness: {f.score}nRelevancy: {r.score}\"<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define the core tools used by the agent: evidence retrieval and answer evaluation. We implement automatic scoring for faithfulness and relevancy so the agent can judge the quality of its own responses.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">from llama_index.core.agent.workflow import ReActAgent\nfrom llama_index.core.workflow import Context\n\n\nagent = ReActAgent(\n   tools=[retrieve_evidence, score_answer],\n   llm=Settings.llm,\n   system_prompt=\"\"\"\nAlways retrieve evidence first.\nProduce a structured answer.\nEvaluate the answer and revise once if scores are low.\n\"\"\",\n   verbose=True\n)\n\n\nctx = Context(agent)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We create the ReAct-based agent and define its system behavior, guiding how it retrieves evidence, generates answers, and revises results. We also initialize the execution context that maintains the agent\u2019s state across interactions. It step brings together tools and reasoning into a single agentic workflow.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">async def run_brief(topic: str):\n   q = f\"Design a reliable RAG + tool-using agent workflow and how to evaluate it. Topic: {topic}\"\n   handler = agent.run(q, ctx=ctx)\n   async for ev in handler.stream_events():\n       print(getattr(ev, \"delta\", \"\"), end=\"\")\n   res = await handler\n   return str(res)\n\n\ntopic = \"RAG agent reliability and evaluation\"\nloop = asyncio.get_event_loop()\nresult = loop.run_until_complete(run_brief(topic))\n\n\nprint(\"nnFINAL OUTPUTn\")\nprint(result)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We execute the full agent loop by passing a topic into the system and streaming the agent\u2019s reasoning and output. We allow the agent to complete its retrieval, generation, and evaluation cycle asynchronously.<\/p>\n<p>In conclusion, we showcased how an agent can retrieve supporting evidence, generate a structured response, and assess its own faithfulness and relevancy before finalizing an answer. We kept the design modular and transparent, making it easy to extend the workflow with additional tools, evaluators, or domain-specific knowledge sources. This approach illustrates how we can use agentic AI with LlamaIndex and OpenAI models to build more capable systems that are also more reliable and self-aware in their reasoning and responses.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Codes\/agentic_llamaindex_rag_self_evaluation_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/01\/17\/how-to-build-a-self-evaluating-agentic-ai-system-with-llamaindex-and-openai-using-retrieval-tool-use-and-automated-quality-checks\/\">How to Build a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Using Retrieval, Tool Use, and Automated Quality Checks<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build an &hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-283","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/283","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=283"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/283\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=283"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=283"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=283"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}