{"id":98,"date":"2025-12-10T06:50:06","date_gmt":"2025-12-09T22:50:06","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=98"},"modified":"2025-12-10T06:50:06","modified_gmt":"2025-12-09T22:50:06","slug":"a-coding-guide-to-build-a-procedural-memory-agent-that-learns-stores-retrieves-and-reuses-skills-as-neural-modules-over-time","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=98","title":{"rendered":"A Coding Guide to Build a Procedural Memory Agent That Learns, Stores, Retrieves, and Reuses Skills as Neural Modules Over Time"},"content":{"rendered":"<p>In this tutorial, we explore how an intelligent agent can gradually form procedural memory by learning reusable skills directly from its interactions with an environment. We design a minimal yet powerful framework in which skills behave like neural modules: they store action sequences, carry contextual embeddings, and are retrieved by similarity when a new situation resembles an experience. As we run our agent through multiple episodes, we observe how its behaviour becomes more efficient, moving from primitive exploration to leveraging a library of skills that it has learned on its own. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Memory\/procedural_memory_agent_skill_learning_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">import numpy as np\nimport matplotlib.pyplot as plt\nfrom collections import defaultdict\n\n\nclass Skill:\n   def __init__(self, name, preconditions, action_sequence, embedding, success_count=0):\n       self.name = name\n       self.preconditions = preconditions\n       self.action_sequence = action_sequence\n       self.embedding = embedding\n       self.success_count = success_count\n       self.times_used = 0\n  \n   def is_applicable(self, state):\n       for key, value in self.preconditions.items():\n           if state.get(key) != value:\n               return False\n       return True\n  \n   def __repr__(self):\n       return f\"Skill({self.name}, used={self.times_used}, success={self.success_count})\"\n\n\nclass SkillLibrary:\n   def __init__(self, embedding_dim=8):\n       self.skills = []\n       self.embedding_dim = embedding_dim\n       self.skill_stats = defaultdict(lambda: {\"attempts\": 0, \"successes\": 0})\n  \n   def add_skill(self, skill):\n       for existing_skill in self.skills:\n           if self._similarity(skill.embedding, existing_skill.embedding) &gt; 0.9:\n               existing_skill.success_count += 1\n               return existing_skill\n       self.skills.append(skill)\n       return skill\n  \n   def retrieve_skills(self, state, query_embedding=None, top_k=3):\n       applicable = [s for s in self.skills if s.is_applicable(state)]\n       if query_embedding is not None and applicable:\n           similarities = [self._similarity(query_embedding, s.embedding) for s in applicable]\n           sorted_skills = [s for _, s in sorted(zip(similarities, applicable), reverse=True)]\n           return sorted_skills[:top_k]\n       return sorted(applicable, key=lambda s: s.success_count \/ max(s.times_used, 1), reverse=True)[:top_k]\n  \n   def _similarity(self, emb1, emb2):\n       return np.dot(emb1, emb2) \/ (np.linalg.norm(emb1) * np.linalg.norm(emb2) + 1e-8)\n  \n   def get_stats(self):\n       return {\n           \"total_skills\": len(self.skills),\n           \"total_uses\": sum(s.times_used for s in self.skills),\n           \"avg_success_rate\": np.mean([s.success_count \/ max(s.times_used, 1) for s in self.skills]) if self.skills else 0\n       }<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define how skills are represented and stored in a memory structure. We implement similarity-based retrieval so that the agent can match a new state with past skills using cosine similarity. As we work through this layer, we see how skill reuse becomes possible once skills acquire metadata, embeddings, and usage statistics. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Memory\/procedural_memory_agent_skill_learning_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class GridWorld:\n   def __init__(self, size=5):\n       self.size = size\n       self.reset()\n  \n   def reset(self):\n       self.agent_pos = [0, 0]\n       self.goal_pos = [self.size-1, self.size-1]\n       self.objects = {\"key\": [2, 2], \"door\": [3, 3], \"box\": [1, 3]}\n       self.inventory = []\n       self.door_open = False\n       return self.get_state()\n  \n   def get_state(self):\n       return {\n           \"agent_pos\": tuple(self.agent_pos),\n           \"has_key\": \"key\" in self.inventory,\n           \"door_open\": self.door_open,\n           \"at_goal\": self.agent_pos == self.goal_pos,\n           \"objects\": {k: tuple(v) for k, v in self.objects.items()}\n       }\n  \n   def step(self, action):\n       reward = -0.1\n       if action == \"move_up\":\n           self.agent_pos[1] = min(self.agent_pos[1] + 1, self.size - 1)\n       elif action == \"move_down\":\n           self.agent_pos[1] = max(self.agent_pos[1] - 1, 0)\n       elif action == \"move_left\":\n           self.agent_pos[0] = max(self.agent_pos[0] - 1, 0)\n       elif action == \"move_right\":\n           self.agent_pos[0] = min(self.agent_pos[0] + 1, self.size - 1)\n       elif action == \"pickup_key\":\n           if self.agent_pos == self.objects[\"key\"] and \"key\" not in self.inventory:\n               self.inventory.append(\"key\")\n               reward = 1.0\n       elif action == \"open_door\":\n           if self.agent_pos == self.objects[\"door\"] and \"key\" in self.inventory:\n               self.door_open = True\n               reward = 2.0\n       done = self.agent_pos == self.goal_pos and self.door_open\n       if done:\n           reward = 10.0\n       return self.get_state(), reward, done<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We construct a simple environment in which the agent learns tasks such as picking up a key, opening a door, and reaching a goal. We use this environment as a playground for our procedural memory system, allowing us to observe how primitive actions evolve into more complex, reusable skills. The environment\u2019s structure helps us observe clear, interpretable improvements in behaviour across episodes. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Memory\/procedural_memory_agent_skill_learning_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">class ProceduralMemoryAgent:\n   def __init__(self, env, embedding_dim=8):\n       self.env = env\n       self.skill_library = SkillLibrary(embedding_dim)\n       self.embedding_dim = embedding_dim\n       self.episode_history = []\n       self.primitive_actions = [\"move_up\", \"move_down\", \"move_left\", \"move_right\", \"pickup_key\", \"open_door\"]\n  \n   def create_embedding(self, state, action_seq):\n       state_vec = np.zeros(self.embedding_dim)\n       state_vec[0] = hash(str(state[\"agent_pos\"])) % 1000 \/ 1000\n       state_vec[1] = 1.0 if state.get(\"has_key\") else 0.0\n       state_vec[2] = 1.0 if state.get(\"door_open\") else 0.0\n       for i, action in enumerate(action_seq[:self.embedding_dim-3]):\n           state_vec[3+i] = hash(action) % 1000 \/ 1000\n       return state_vec \/ (np.linalg.norm(state_vec) + 1e-8)\n  \n   def extract_skill(self, trajectory):\n       if len(trajectory) &lt; 2:\n           return None\n       start_state = trajectory[0][0]\n       actions = [a for _, a, _ in trajectory]\n       preconditions = {\"has_key\": start_state.get(\"has_key\", False), \"door_open\": start_state.get(\"door_open\", False)}\n       end_state = self.env.get_state()\n       if end_state.get(\"has_key\") and not start_state.get(\"has_key\"):\n           name = \"acquire_key\"\n       elif end_state.get(\"door_open\") and not start_state.get(\"door_open\"):\n           name = \"open_door_sequence\"\n       else:\n           name = f\"navigate_{len(actions)}_steps\"\n       embedding = self.create_embedding(start_state, actions)\n       return Skill(name, preconditions, actions, embedding, success_count=1)\n  \n   def execute_skill(self, skill):\n       skill.times_used += 1\n       trajectory = []\n       total_reward = 0\n       for action in skill.action_sequence:\n           state = self.env.get_state()\n           next_state, reward, done = self.env.step(action)\n           trajectory.append((state, action, reward))\n           total_reward += reward\n           if done:\n               skill.success_count += 1\n               return trajectory, total_reward, True\n       return trajectory, total_reward, False\n  \n   def explore(self, max_steps=20):\n       trajectory = []\n       state = self.env.get_state()\n       for _ in range(max_steps):\n           action = self._choose_exploration_action(state)\n           next_state, reward, done = self.env.step(action)\n           trajectory.append((state, action, reward))\n           state = next_state\n           if done:\n               return trajectory, True\n       return trajectory, False<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We focus on building embeddings that encode the context of a state-action sequence, enabling us to meaningfully compare skills. We also extract skills from successful trajectories, transforming raw experience into reusable behaviours. As we run this code, we observe how simple exploration gradually yields structured knowledge that the agent can apply later. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Memory\/procedural_memory_agent_skill_learning_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">   def _choose_exploration_action(self, state):\n       agent_pos = state[\"agent_pos\"]\n       if not state.get(\"has_key\"):\n           key_pos = state[\"objects\"][\"key\"]\n           if agent_pos == key_pos:\n               return \"pickup_key\"\n           if agent_pos[0] &lt; key_pos[0]:\n               return \"move_right\"\n           if agent_pos[0] &gt; key_pos[0]:\n               return \"move_left\"\n           if agent_pos[1] &lt; key_pos[1]:\n               return \"move_up\"\n           return \"move_down\"\n       if state.get(\"has_key\") and not state.get(\"door_open\"):\n           door_pos = state[\"objects\"][\"door\"]\n           if agent_pos == door_pos:\n               return \"open_door\"\n           if agent_pos[0] &lt; door_pos[0]:\n               return \"move_right\"\n           if agent_pos[0] &gt; door_pos[0]:\n               return \"move_left\"\n           if agent_pos[1] &lt; door_pos[1]:\n               return \"move_up\"\n           return \"move_down\"\n       goal_pos = (4, 4)\n       if agent_pos[0] &lt; goal_pos[0]:\n           return \"move_right\"\n       if agent_pos[1] &lt; goal_pos[1]:\n           return \"move_up\"\n       return np.random.choice(self.primitive_actions)\n  \n   def run_episode(self, use_skills=True):\n       self.env.reset()\n       total_reward = 0\n       steps = 0\n       trajectory = []\n       while steps &lt; 50:\n           state = self.env.get_state()\n           if use_skills and self.skill_library.skills:\n               query_emb = self.create_embedding(state, [])\n               skills = self.skill_library.retrieve_skills(state, query_emb, top_k=1)\n               if skills:\n                   skill_traj, skill_reward, success = self.execute_skill(skills[0])\n                   trajectory.extend(skill_traj)\n                   total_reward += skill_reward\n                   steps += len(skill_traj)\n                   if success:\n                       return trajectory, total_reward, steps, True\n                   continue\n           action = self._choose_exploration_action(state)\n           next_state, reward, done = self.env.step(action)\n           trajectory.append((state, action, reward))\n           total_reward += reward\n           steps += 1\n           if done:\n               return trajectory, total_reward, steps, True\n       return trajectory, total_reward, steps, False\n  \n   def train(self, episodes=10):\n       stats = {\"rewards\": [], \"steps\": [], \"skills_learned\": [], \"skill_uses\": []}\n       for ep in range(episodes):\n           trajectory, reward, steps, success = self.run_episode(use_skills=True)\n           if success and len(trajectory) &gt;= 3:\n               segment = trajectory[-min(5, len(trajectory)):]\n               skill = self.extract_skill(segment)\n               if skill:\n                   self.skill_library.add_skill(skill)\n           stats[\"rewards\"].append(reward)\n           stats[\"steps\"].append(steps)\n           stats[\"skills_learned\"].append(len(self.skill_library.skills))\n           stats[\"skill_uses\"].append(self.skill_library.get_stats()[\"total_uses\"])\n           print(f\"Episode {ep+1}: Reward={reward:.1f}, Steps={steps}, Skills={len(self.skill_library.skills)}, Success={success}\")\n       return stats<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We define how the agent chooses between using known skills and exploring with primitive actions. We train the agent across several episodes and record the evolution of learned skills, usage counts, and success rates. As we examine this part, we observe that skill reuse reduces episode length and improves overall rewards. Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Memory\/procedural_memory_agent_skill_learning_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.<\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">def visualize_training(stats):\n   fig, axes = plt.subplots(2, 2, figsize=(12, 8))\n   axes[0, 0].plot(stats[\"rewards\"])\n   axes[0, 0].set_title(\"Episode Rewards\")\n   axes[0, 1].plot(stats[\"steps\"])\n   axes[0, 1].set_title(\"Steps per Episode\")\n   axes[1, 0].plot(stats[\"skills_learned\"])\n   axes[1, 0].set_title(\"Skills in Library\")\n   axes[1, 1].plot(stats[\"skill_uses\"])\n   axes[1, 1].set_title(\"Cumulative Skill Uses\")\n   plt.tight_layout()\n   plt.savefig(\"skill_learning_stats.png\", dpi=150, bbox_inches='tight')\n   plt.show()\n\n\nif __name__ == \"__main__\":\n   print(\"=== Procedural Memory Agent Demo ===n\")\n   env = GridWorld(size=5)\n   agent = ProceduralMemoryAgent(env)\n   print(\"Training agent to learn reusable skills...n\")\n   stats = agent.train(episodes=15)\n   print(\"n=== Learned Skills ===\")\n   for skill in agent.skill_library.skills:\n       print(f\"{skill.name}: {len(skill.action_sequence)} actions, used {skill.times_used} times, {skill.success_count} successes\")\n   lib_stats = agent.skill_library.get_stats()\n   print(f\"n=== Library Statistics ===\")\n   print(f\"Total skills: {lib_stats['total_skills']}\")\n   print(f\"Total skill uses: {lib_stats['total_uses']}\")\n   print(f\"Avg success rate: {lib_stats['avg_success_rate']:.2%}\")\n   visualize_training(stats)\n   print(\"n\u2713 Skill learning complete! Check the visualization above.\")<\/code><\/pre>\n<\/div>\n<\/div>\n<p>We bring everything together by running training, printing learned skills, and plotting behaviour statistics. We visualize the trend in rewards and how the skill library grows over time. By running this snippet, we complete the lifecycle of procedural memory formation and confirm that the agent learns to behave more intelligently with experience.<\/p>\n<p>In conclusion, we see how procedural memory emerges naturally when an agent learns to extract skills from its own successful trajectories. We observe how skills are gained, structure, metadata, embeddings, and usage patterns, allowing the agent to reuse them efficiently in future situations. Lastly, we appreciate how even a small environment and simple heuristics lead to meaningful learning dynamics, giving us a concrete understanding of what it means for an agent to develop reusable internal competencies over time.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\/blob\/main\/Agentic%20AI%20Memory\/procedural_memory_agent_skill_learning_Marktechpost.ipynb\" target=\"_blank\" rel=\"noreferrer noopener\">FULL CODES here<\/a><\/strong>.\u00a0Feel free to check out our\u00a0<strong><mark><a href=\"https:\/\/github.com\/Marktechpost\/AI-Tutorial-Codes-Included\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Page for Tutorials, Codes and Notebooks<\/a><\/mark><\/strong>.\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2025\/12\/09\/a-coding-guide-to-build-a-procedural-memory-agent-that-learns-stores-retrieves-and-reuses-skills-as-neural-modules-over-time\/\">A Coding Guide to Build a Procedural Memory Agent That Learns, Stores, Retrieves, and Reuses Skills as Neural Modules Over Time<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we explore h&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-98","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/98","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=98"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/98\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=98"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=98"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=98"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}