{"id":402,"date":"2026-02-12T01:49:33","date_gmt":"2026-02-11T17:49:33","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=402"},"modified":"2026-02-12T01:49:33","modified_gmt":"2026-02-11T17:49:33","slug":"how-to-build-an-atomic-agents-rag-pipeline-with-typed-schemas-dynamic-context-injection-and-agent-chaining","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=402","title":{"rendered":"How to Build an Atomic-Agents RAG Pipeline with Typed Schemas, Dynamic Context Injection, and Agent Chaining"},"content":{"rendered":"<p>In this tutorial, we build an advanced, end-to-end learning pipeline around<strong> <\/strong><a href=\"https:\/\/github.com\/BrainBlend-AI\/atomic-agents\"><strong>Atomic-Agents<\/strong><\/a> by wiring together typed agent interfaces, structured prompting, and a compact retrieval layer that grounds outputs in real project documentation. Also, we demonstrate how to plan retrieval, retrieve relevant context, inject it dynamically into an answering agent, and run an interactive loop that turns the setup into a reusable research assistant for any new Atomic Agents question. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Codes\/atomic_agents_advanced_rag_pipeline_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a>.<\/strong><\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">import os, sys, textwrap, time, json, re\nfrom typing import List, Optional, Dict, Tuple\nfrom dataclasses import dataclass\nimport subprocess\nsubprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-q\",\n                      \"atomic-agents\", \"instructor\", \"openai\", \"pydantic\",\n                      \"requests\", \"beautifulsoup4\", \"scikit-learn\"])\nfrom getpass import getpass\nif not os.environ.get(\"OPENAI_API_KEY\"):\n   os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OPENAI_API_KEY (input hidden): \").strip()\nMODEL = os.environ.get(\"OPENAI_MODEL\", \"gpt-4o-mini\")\nfrom pydantic import Field\nfrom openai import OpenAI\nimport instructor\nfrom atomic_agents import AtomicAgent, AgentConfig, BaseIOSchema\nfrom atomic_agents.context import SystemPromptGenerator, ChatHistory, BaseDynamicContextProvider\nimport requests\nfrom bs4 import BeautifulSoup<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We install all required packages, import the core Atomic-Agents primitives, and set up Colab-compatible dependencies in one place. We securely capture the OpenAI API key from the keyboard and store it in the environment so downstream code never hardcodes secrets. We also lock in a default model name while keeping it configurable via an environment variable.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">def fetch_url_text(url: str, timeout: int = 20) -&gt; str:\n   r = requests.get(url, timeout=timeout, headers={\"User-Agent\": \"Mozilla\/5.0\"})\n   r.raise_for_status()\n   soup = BeautifulSoup(r.text, \"html.parser\")\n   for tag in soup([\"script\", \"style\", \"nav\", \"header\", \"footer\", \"noscript\"]):\n       tag.decompose()\n   text = soup.get_text(\"n\")\n   text = re.sub(r\"[ t]+\", \" \", text)\n   text = re.sub(r\"n{3,}\", \"nn\", text).strip()\n   return text\n\n\ndef chunk_text(text: str, max_chars: int = 1400, overlap: int = 200) -&gt; List[str]:\n   if not text:\n       return []\n   chunks = []\n   i = 0\n   while i &lt; len(text):\n       chunk = text[i:i+max_chars].strip()\n       if chunk:\n           chunks.append(chunk)\n       i += max_chars - overlap\n   return chunks\n\n\ndef clamp(s: str, n: int = 800) -&gt; str:\n   s = (s or \"\").strip()\n   return s if len(s) &lt;= n else s[:n].rstrip() + \"\u2026\"<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We fetch web pages from the Atomic Agents repo and docs, then clean them into plain text so retrieval becomes reliable. We chunk long documents into overlapping segments, preserving context while keeping each chunk small enough for ranking and citation. We also add a small helper to clamp long snippets so our injected context stays readable.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">from sklearn.feature_extraction.text import TfidfVectorizer\nfrom sklearn.metrics.pairwise import cosine_similarity\n\n\n@dataclass\nclass Snippet:\n   doc_id: str\n   url: str\n   chunk_id: int\n   text: str\n   score: float\n\n\nclass MiniCorpusRetriever:\n   def __init__(self, docs: Dict[str, Tuple[str, str]]):\n       self.items: List[Tuple[str, str, int, str]] = []\n       for doc_id, (url, raw) in docs.items():\n           for idx, ch in enumerate(chunk_text(raw)):\n               self.items.append((doc_id, url, idx, ch))\n       if not self.items:\n           raise RuntimeError(\"No documents were fetched; cannot build TF-IDF index.\")\n       self.vectorizer = TfidfVectorizer(stop_words=\"english\", max_features=50000)\n       self.matrix = self.vectorizer.fit_transform([it[3] for it in self.items])\n\n\n   def search(self, query: str, k: int = 6) -&gt; List[Snippet]:\n       qv = self.vectorizer.transform([query])\n       sims = cosine_similarity(qv, self.matrix).ravel()\n       top = sims.argsort()[::-1][:k]\n       out = []\n       for j in top:\n           doc_id, url, chunk_id, txt = self.items[j]\n           out.append(Snippet(doc_id=doc_id, url=url, chunk_id=chunk_id, text=txt, score=float(sims[j])))\n       return out\n\n\nclass RetrievedContextProvider(BaseDynamicContextProvider):\n   def __init__(self, title: str, snippets: List[Snippet]):\n       super().__init__(title=title)\n       self.snippets = snippets\n\n\n   def get_info(self) -&gt; str:\n       blocks = []\n       for s in self.snippets:\n           blocks.append(\n               f\"[{s.doc_id}#{s.chunk_id}] (score={s.score:.3f}) {s.url}n{clamp(s.text, 900)}\"\n           )\n       return \"nn\".join(blocks)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We build a mini retrieval system using TF-IDF and cosine similarity over the chunked documentation corpus. We wrap each retrieved chunk in a structured Snippet object to track doc IDs, chunk IDs, and citation scores. We then inject top-ranked chunks into the agent\u2019s runtime via a dynamic context provider, keeping the answering agent grounded. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Codes\/atomic_agents_advanced_rag_pipeline_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a>.<\/strong><\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class PlanInput(BaseIOSchema):\n   \"\"\"Input schema for the planner agent: describes the user's task and how many retrieval queries to draft.\"\"\"\n   task: str = Field(...)\n   num_queries: int = Field(4)\n\n\nclass PlanOutput(BaseIOSchema):\n   \"\"\"Output schema from the planner agent: retrieval queries, coverage checklist, and safety checks.\"\"\"\n   queries: List[str]\n   must_cover: List[str]\n   safety_checks: List[str]\n\n\nclass AnswerInput(BaseIOSchema):\n   \"\"\"Input schema for the answering agent: user question plus style constraints.\"\"\"\n   question: str\n   style: str = \"concise but advanced\"\n\n\nclass AnswerOutput(BaseIOSchema):\n   \"\"\"Output schema for the answering agent: grounded answer, next steps, and which citations were used.\"\"\"\n   answer: str\n   next_steps: List[str]\n   used_citations: List[str]\n\n\nclient = instructor.from_openai(OpenAI(api_key=os.environ[\"OPENAI_API_KEY\"]))\n\n\nplanner_prompt = SystemPromptGenerator(\n   background=[\n       \"You are a rigorous research planner for a small RAG system.\",\n       \"You propose retrieval queries that are diverse (lexical + semantic) and designed to find authoritative info.\",\n       \"You do NOT answer the task; you only plan retrieval.\"\n   ],\n   steps=[\n       \"Read the task.\",\n       \"Propose diverse retrieval queries (not too long).\",\n       \"List must-cover aspects and safety checks.\"\n   ],\n   output_instructions=[\n       \"Return strictly the PlanOutput schema.\",\n       \"Queries must be directly usable as search strings.\",\n       \"Must-cover should be 4\u20138 bullets.\"\n   ]\n)\n\n\nplanner = AtomicAgent[PlanInput, PlanOutput](\n   config=AgentConfig(\n       client=client,\n       model=MODEL,\n       system_prompt_generator=planner_prompt,\n       history=ChatHistory(),\n   )\n)\n\n\nanswerer_prompt = SystemPromptGenerator(\n   background=[\n       \"You are an expert technical tutor for Atomic Agents (atomic-agents).\",\n       \"You are given retrieved context snippets with IDs like [doc#chunk].\",\n       \"You must ground claims in the provided snippets and cite them inline.\"\n   ],\n   steps=[\n       \"Read the question and the provided context.\",\n       \"Synthesize an accurate answer using only supported facts.\",\n       \"Cite claims inline using the provided snippet IDs.\"\n   ],\n   output_instructions=[\n       \"Use inline citations like [readme#12] or [docs_home#3].\",\n       \"If the context does not support something, say so briefly and suggest what to retrieve next.\",\n       \"Return strictly the AnswerOutput schema.\"\n   ]\n)\n\n\nanswerer = AtomicAgent[AnswerInput, AnswerOutput](\n   config=AgentConfig(\n       client=client,\n       model=MODEL,\n       system_prompt_generator=answerer_prompt,\n       history=ChatHistory(),\n   )\n)\n<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define strict-typed schemas for planner and answerer inputs and outputs, and include docstrings to satisfy Atomic Agents\u2019 schema requirements. We create an Instructor-wrapped OpenAI client and configure two Atomic Agents with explicit system prompts and chat history. We enforce structured outputs so the planner produces queries and the answerer produces a cited response with clear next steps.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">SOURCES = {\n   \"readme\": \"https:\/\/github.com\/BrainBlend-AI\/atomic-agents\",\n   \"docs_home\": \"https:\/\/brainblend-ai.github.io\/atomic-agents\/\",\n   \"examples_index\": \"https:\/\/brainblend-ai.github.io\/atomic-agents\/examples\/index.html\",\n}\n\n\nraw_docs: Dict[str, Tuple[str, str]] = {}\nfor doc_id, url in SOURCES.items():\n   try:\n       raw_docs[doc_id] = (url, fetch_url_text(url))\n   except Exception:\n       raw_docs[doc_id] = (url, \"\")\n\n\nnon_empty = [d for d in raw_docs.values() if d[1].strip()]\nif not non_empty:\n   raise RuntimeError(\"All source fetches failed or were empty. Check network access in Colab and retry.\")\n\n\nretriever = MiniCorpusRetriever(raw_docs)\n\n\ndef run_atomic_rag(question: str, k: int = 7, verbose: bool = True) -&gt; AnswerOutput:\n   t0 = time.time()\n   plan = planner.run(PlanInput(task=question, num_queries=4))\n   all_snips: List[Snippet] = []\n   for q in plan.queries:\n       all_snips.extend(retriever.search(q, k=max(2, k \/\/ 2)))\n   best: Dict[Tuple[str, int], Snippet] = {}\n   for s in all_snips:\n       key = (s.doc_id, s.chunk_id)\n       if (key not in best) or (s.score &gt; best[key].score):\n           best[key] = s\n   snips = sorted(best.values(), key=lambda x: x.score, reverse=True)[:k]\n   ctx = RetrievedContextProvider(title=\"Retrieved Atomic Agents Context\", snippets=snips)\n   answerer.register_context_provider(\"retrieved_context\", ctx)\n   out = answerer.run(AnswerInput(question=question, style=\"concise, advanced, practical\"))\n   if verbose:\n       print(out.answer)\n   return out\n\n\ndemo_q = \"Teach me Atomic Agents at an advanced level: explain the core building blocks and show how to chain agents with typed schemas and dynamic context.\"\nrun_atomic_rag(demo_q, k=7, verbose=True)\n\n\nwhile True:\n   user_q = input(\"nYour question&gt; \").strip()\n   if not user_q or user_q.lower() in {\"exit\", \"quit\"}:\n       break\n   run_atomic_rag(user_q, k=7, verbose=True)<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We fetch a small set of authoritative Atomic Agents sources and build a local retrieval index from them. We implement a full pipeline function that plans queries, retrieves relevant context, injects it, and produces a grounded final answer. We finish by running a demo query and launching an interactive loop so we can keep asking questions and getting cited answers.<\/p>\n<p>In conclusion, we completed the Atomic-Agents workflow in Colab, cleanly separating planning, retrieval, answering, and ensuring strong typing. We kept the system grounded by injecting only the highest-signal documentation chunks as dynamic context, and we enforced a citation discipline that makes outputs auditable. From here, we can scale this pattern by adding more sources, swapping in stronger retrievers or rerankers, introducing tool-use agents, and turning the pipeline into a production-grade research assistant that remains both fast and trustworthy.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Codes\/atomic_agents_advanced_rag_pipeline_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/02\/11\/how-to-build-an-atomic-agents-rag-pipeline-with-typed-schemas-dynamic-context-injection-and-agent-chaining\/\">How to Build an Atomic-Agents RAG Pipeline with Typed Schemas, Dynamic Context Injection, and Agent Chaining<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we build an &hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-402","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/402","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=402"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/402\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=402"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=402"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=402"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}