{"id":995,"date":"2026-05-29T15:28:37","date_gmt":"2026-05-29T07:28:37","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=995"},"modified":"2026-05-29T15:28:37","modified_gmt":"2026-05-29T07:28:37","slug":"hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness-and-the-model-weights","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=995","title":{"rendered":"Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights"},"content":{"rendered":"<p class=\"wp-block-paragraph\">Most AI agents stop improving once a human stops tuning them. The model is fixed. The scaffold around it is fixed. Hexo Labs wants to move both at once. It released <a href=\"https:\/\/github.com\/hexo-ai\/sia\" target=\"_blank\" rel=\"noreferrer noopener\">SIA (Self-Improving AI)<\/a> this week as an open-source framework under an MIT license. <\/p>\n<p class=\"wp-block-paragraph\">The core claim of this research is narrow but concrete. SIA edits both the agent\u2019s scaffold and the model\u2019s weights inside one self-improving loop. <\/p>\n<h2 class=\"wp-block-heading\"><strong>What is SIA (Self-Improving AI)<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">SIA splits a task-specific agent into two parts. The first is the harness, also called the scaffold. That covers the system prompt, tool-dispatch logic, retry policy, and answer-extraction code. The second part is the model weights themselves.<\/p>\n<p class=\"wp-block-paragraph\">Three LLM components drive the loop. A Meta-Agent writes the initial scaffold from a task specification and any reference code. A Task-Specific Agent runs the task and logs every step. A Feedback-Agent then reads that full trajectory and decides what to change.<\/p>\n<p class=\"wp-block-paragraph\">That decision is the key idea. After each run, the Feedback-Agent picks one of two actions. It can rewrite the scaffold while weights stay fixed. Or it can trigger a weight update while the scaffold stays fixed.<\/p>\n<p class=\"wp-block-paragraph\">The base model is openai\/gpt-oss-120b. Weight updates use LoRA, a low-rank adapter, at rank 32. The Meta-Agent and Feedback-Agent both run on Claude Sonnet 4.6. Training runs on H100 GPUs through Modal, the team\u2019s RL platform.<\/p>\n<p class=\"wp-block-paragraph\">The research team labels its two operating points SIA-H and SIA-W+H. SIA-H uses harness updates only. SIA-W+H adds weight updates on top.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1382\" height=\"632\" data-attachment-id=\"80177\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/05\/29\/hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness-and-the-model-weights\/screenshot-2026-05-29-at-12-13-58-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-29-at-12.13.58-AM-1.png\" data-orig-size=\"1382,632\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;,&quot;alt&quot;:&quot;&quot;}\" data-image-title=\"Screenshot 2026-05-29 at 12.13.58\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-29-at-12.13.58-AM-1-1024x468.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/05\/Screenshot-2026-05-29-at-12.13.58-AM-1.png\" alt=\"\" class=\"wp-image-80177\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2605.27276<\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>The Benchmark Case<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">The research team tested SIA on three deliberately different domains. The pattern held across all three. Weight updates added gains beyond what scaffold editing alone reached. \u201cInitial\u201d is the base model through the Meta-Agent\u2019s first scaffold, before any feedback.<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Task<\/th>\n<th>Initial<\/th>\n<th>Prev. SOTA<\/th>\n<th>SIA-H (harness only)<\/th>\n<th>SIA-W+H (harness + weights)<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>LawBench (top-1 acc)<\/td>\n<td>13.5%<\/td>\n<td>45.0%<\/td>\n<td>50.0%<\/td>\n<td>70.1%<\/td>\n<\/tr>\n<tr>\n<td>AlphaEvolve TriMul (reward)<\/td>\n<td>0.105<\/td>\n<td>1.292<\/td>\n<td>0.120<\/td>\n<td>1.475<\/td>\n<\/tr>\n<tr>\n<td>Denoising (mse_norm)<\/td>\n<td>0.048<\/td>\n<td>0.240<\/td>\n<td>0.241<\/td>\n<td>0.289<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\">On LawBench, the task is 191-class Chinese criminal charge classification. Harness iteration built a TF-IDF plus LinearSVC pipeline and plateaued at 50.0%. Weight updates via PPO then pushed accuracy to 70.1%. That is a 20.1 percentage-point gain over the harness-only best.<\/p>\n<p class=\"wp-block-paragraph\">The TriMul task asks for a custom CUDA kernel on an H100 GPU. The kernel computes a core operation in AlphaFold2\u2019s Evoformer module. Scaffold edits reached a 1.14\u00d7 speedup over baseline. Weight updates then drove runtime from 12,483 to 1,017 microseconds. That is a 91.9% reduction from the harness-only peak.<\/p>\n<p class=\"wp-block-paragraph\">One honest caveat appears in the same chart. The coding agent Claude Code reached 1.50\u00d7 on TriMul unaided, beating SIA-H\u2019s 1.14\u00d7. SIA-W+H still led overall at 14.02\u00d7.<\/p>\n<p class=\"wp-block-paragraph\">For denoising, the agent tunes MAGIC, a single-cell RNA imputation method. Harness sweeps over its hyperparameters settled at 0.241 mse_norm. The first weight-update checkpoint added a two-line step that no scaffold produced. It rounded imputed counts to non-negative integers, lifting the score to 0.289.<\/p>\n<h2 class=\"wp-block-heading\"><strong>How the Feedback-Agent Picks Its Move<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">SIA does not run one fixed RL recipe. The Feedback-Agent selects a training algorithm based on the reward signal it observes.<\/p>\n<p class=\"wp-block-paragraph\">On LawBench, the reward was a clean outcome-based scalar, so it used PPO with GAE. On TriMul, most kernels failed to compile, so it used entropic advantage weighting. That method up-weights rare high-reward rollouts. On denoising, it used GRPO, which eliminates the value network entirely.<\/p>\n<p class=\"wp-block-paragraph\">The research team also lists REINFORCE with KL-to-base, DPO, and best-of-N behavioural cloning. Each maps to a different reward shape and failure risk.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Strengths <\/strong><strong>and What to Watch<\/strong><\/h2>\n<h4 class=\"wp-block-heading\"><strong>Strengths:<\/strong><\/h4>\n<ul class=\"wp-block-list\">\n<li>First system to edit both scaffold and weights in one loop, per the authors\u2019 comparison table.<\/li>\n<li>Consistent gains over prior SOTA across three unrelated domains.<\/li>\n<li>Open source under MIT, installable as sia-agent, with four bundled tasks.<\/li>\n<li>Algorithm choice is conditioned on observed rewards, not a fixed schedule.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\"><strong>What to Watch:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>The research reports three tasks; broader algorithm-selection results are deferred.<\/li>\n<li>Both levers optimise the same fixed verifier, risking coupled Goodhart effects.<\/li>\n<li>The research warn the joint fixed point may be fragile under perturbation.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>Marktechpost\u2019s Visual Explainer<\/strong><\/h2>\n<div>\n<div class=\"sg-bar\"><i><\/i><\/div>\n<div class=\"sg-stage\">\n<p>    <!-- 1 --><\/p>\n<section class=\"sg-slide on\">\n<div class=\"sg-eyebrow\">Hexo Labs \u00b7 Open Source (MIT)<\/div>\n<div class=\"sg-cover-title\">SIA: Self-Improving AI<\/div>\n<div class=\"sg-cover-sub\">Harness + Weight Updates<\/div>\n<p class=\"sg-lede\">A self-improving loop that edits both an agent\u2019s scaffold and its model weights, without further human tuning.<\/p>\n<div class=\"sg-chips\">\n        <span class=\"sg-chip\">gpt-oss-120b<\/span><br \/>\n        <span class=\"sg-chip\">LoRA rank 32<\/span><br \/>\n        <span class=\"sg-chip\">3 benchmarks<\/span><br \/>\n        <span class=\"sg-chip\">Claude Sonnet 4.6 agents<\/span>\n      <\/div>\n<\/section>\n<p>    <!-- 2 --><\/p>\n<section class=\"sg-slide\">\n<div class=\"sg-eyebrow\">The Gap<\/div>\n<h2>Two silos, operating in isolation<\/h2>\n<div class=\"sg-grid\">\n<div class=\"sg-box\">\n          <span class=\"tag tag-teal\">Harness school<\/span>\n<h3>Edit the scaffold<\/h3>\n<p>A meta-agent rewrites prompts, tools, and retry logic. The model weights stay fixed.<\/p>\n<\/div>\n<div class=\"sg-box\">\n          <span class=\"tag tag-amber\">Test-time training<\/span>\n<h3>Edit the weights<\/h3>\n<p>An RL pipeline updates the model on task feedback. The harness stays fixed.<\/p>\n<\/div>\n<\/div>\n<hr class=\"sg-line\" \/>\n<p>SIA closes the gap by moving both levers inside one loop.<\/p>\n<\/section>\n<p>    <!-- 3 --><\/p>\n<section class=\"sg-slide\">\n<div class=\"sg-eyebrow\">Anatomy<\/div>\n<h2>What SIA actually is<\/h2>\n<ul class=\"sg-list\">\n<li><b>Harness (scaffold):<\/b> the system prompt, tool-dispatch logic, retry policy, and answer-extraction code.<\/li>\n<li><b>Weights:<\/b> the model\u2019s own parameters, adapted with LoRA at rank 32.<\/li>\n<li><b>Three LLM components<\/b> drive the loop: a Meta-Agent, a Task-Specific Agent, and a Feedback-Agent.<\/li>\n<\/ul>\n<\/section>\n<p>    <!-- 4 --><\/p>\n<section class=\"sg-slide\">\n<div class=\"sg-eyebrow\">The Loop<\/div>\n<h2>One loop, two levers<\/h2>\n<p>After each run, the Feedback-Agent reads the full trajectory and picks one action.<\/p>\n<div class=\"sg-grid\">\n<div class=\"sg-box\">\n          <span class=\"tag tag-teal\">Action A<\/span>\n<h3>Harness update<\/h3>\n<p>Rewrite the scaffold. Weights are held fixed.<\/p>\n<\/div>\n<div class=\"sg-box\">\n          <span class=\"tag tag-amber\">Action B<\/span>\n<h3>Weight update<\/h3>\n<p>Train LoRA weights. The scaffold is held fixed.<\/p>\n<\/div>\n<\/div>\n<hr class=\"sg-line\" \/>\n<p>The two levers interleave freely, not in locked sequential phases.<\/p>\n<\/section>\n<p>    <!-- 5 --><\/p>\n<section class=\"sg-slide\">\n<div class=\"sg-eyebrow\">Evidence<\/div>\n<h2>Benchmark results<\/h2>\n<div class=\"sg-tablewrap\">\n<table>\n<thead>\n<tr>\n<th>Task<\/th>\n<th>Initial<\/th>\n<th>Prev. SOTA<\/th>\n<th>SIA-H<\/th>\n<th>SIA-W+H<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>LawBench (top-1 acc)<\/td>\n<td>13.5%<\/td>\n<td>45.0%<\/td>\n<td>50.0%<\/td>\n<td class=\"best\">70.1%<\/td>\n<\/tr>\n<tr>\n<td>AlphaEvolve TriMul (reward)<\/td>\n<td>0.105<\/td>\n<td>1.292<\/td>\n<td>0.120<\/td>\n<td class=\"best\">1.475<\/td>\n<\/tr>\n<tr>\n<td>Denoising (mse_norm)<\/td>\n<td>0.048<\/td>\n<td>0.240<\/td>\n<td>0.241<\/td>\n<td class=\"best\">0.289<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n<p>SIA-W+H (harness + weights) beat SIA-H (harness only) on all three tasks.<\/p>\n<\/section>\n<p>    <!-- 6 --><\/p>\n<section class=\"sg-slide\">\n<div class=\"sg-eyebrow\">Mechanism<\/div>\n<h2>How the Feedback-Agent picks its move<\/h2>\n<ul class=\"sg-list\">\n<li><b>LawBench:<\/b> a clean outcome-based reward, so it used PPO with GAE. Accuracy reached 70.1%.<\/li>\n<li><b>TriMul:<\/b> most kernels fail to compile, so it used entropic advantage weighting. Runtime hit 1,017 \u00b5s.<\/li>\n<li><b>Denoising:<\/b> it used GRPO, which eliminates the value network. Score rose to 0.289.<\/li>\n<li><b>Also available:<\/b> REINFORCE + KL-to-base, DPO, and best-of-N behavioural cloning.<\/li>\n<\/ul>\n<\/section>\n<p>    <!-- 7 --><\/p>\n<section class=\"sg-slide\">\n<div class=\"sg-eyebrow\">RQ2<\/div>\n<h2>What each lever changes<\/h2>\n<div class=\"sg-grid\">\n<div class=\"sg-box\">\n          <span class=\"tag tag-teal\">Harness<\/span>\n<h3>Externalised changes<\/h3>\n<p>Software-engineering improvements: new tools, tighter parsers, retry logic.<\/p>\n<\/div>\n<div class=\"sg-box\">\n          <span class=\"tag tag-amber\">Weights<\/span>\n<h3>Internalised knowledge<\/h3>\n<p>Domain knowledge no prompt reaches: H100 kernel patterns, an integer-rounding step.<\/p>\n<\/div>\n<\/div>\n<hr class=\"sg-line\" \/>\n<p>The harness shapes how the agent searches; weight updates change what the model knows.<\/p>\n<\/section>\n<p>    <!-- 8 --><\/p>\n<section class=\"sg-slide\">\n<div class=\"sg-eyebrow\">The Honest Read<\/div>\n<h2>Limitations to keep in view<\/h2>\n<ul class=\"sg-list\">\n<li>Both levers optimise the <b>same fixed verifier<\/b>, risking a coupled co-evolutionary Goodhart effect.<\/li>\n<li>Fixed points can look strong on the verifier yet stay <b>fragile under perturbation<\/b>.<\/li>\n<li>The paper reports <b>three tasks<\/b>; broader algorithm-selection results are deferred.<\/li>\n<li>A separate <b>350\u00d7 superintelligence<\/b> claim in launch coverage does not appear in the paper.<\/li>\n<\/ul>\n<\/section>\n<p>    <!-- 9 --><\/p>\n<section class=\"sg-slide\">\n<div class=\"sg-eyebrow\">Get Started<\/div>\n<h2>Run it yourself<\/h2>\n<p>Open source under MIT at hexo-ai\/sia. Built on gpt-oss-120b with LoRA rank 32.<\/p>\n<pre><code><span class=\"sg-cmt\"># install the Claude backend<\/span>\npip install <span class=\"sg-tok\">'sia-agent[claude]'<\/span>\nexport ANTHROPIC_API_KEY=<span class=\"sg-tok\">\"...\"<\/span>\n\n<span class=\"sg-cmt\"># run 5 self-improvement generations on a bundled task<\/span>\nsia --task lawbench --max_gen <span class=\"sg-tok\">5<\/span> --run_id <span class=\"sg-tok\">1<\/span><\/code><\/pre>\n<p>Four bundled tasks ship in the box: gpqa, lawbench, longcot-chess, spaceship-titanic.<\/p>\n<\/section><\/div>\n<div class=\"sg-nav\">\n<div class=\"sg-dots\" role=\"tablist\" aria-label=\"Slide navigation\"><\/div>\n<div class=\"sg-btns\">\n      <button class=\"sg-btn\" data-sg=\"prev\" aria-label=\"Previous slide\">\u2190 Prev<\/button><br \/>\n      <span class=\"sg-count\">01 \/ 09<\/span><br \/>\n      <button class=\"sg-btn pri\" data-sg=\"next\" aria-label=\"Next slide\">Next \u2192<\/button>\n    <\/div>\n<\/div>\n<div class=\"sg-foot\">\n    <span>Source: Hebbar et al., <em>SIA: Self Improving AI with Harness &amp; Weight Updates<\/em> (arXiv:2605.27276)<\/span><br \/>\n    <a href=\"https:\/\/github.com\/hexo-ai\/sia\" target=\"_blank\" rel=\"noopener\">github.com\/hexo-ai\/sia<\/a>\n  <\/div>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>SIA is the first self-improving loop that edits both an agent&#8217;s scaffold and its model weights.<\/li>\n<li>A Feedback-Agent reads each run&#8217;s full trajectory, then picks a harness rewrite or weight update.<\/li>\n<li>Combining both levers beat scaffold-only on all three tasks: LawBench, TriMul kernels, scRNA-seq denoising.<\/li>\n<li>Harness edits add software-engineering hygiene; weight updates surface domain knowledge no prompt reaches.<\/li>\n<li>Open source under MIT (hexo-ai\/sia), built on gpt-oss-120b with LoRA rank 32.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<\/p><p class=\"wp-block-paragraph\">\n<\/p><p class=\"wp-block-paragraph\">Check out\u00a0the\u00a0<strong><a href=\"https:\/\/github.com\/hexo-ai\/sia\" target=\"_blank\" rel=\"noreferrer noopener\">Repo<\/a>\u00a0<\/strong>and<strong>\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/2605.27276\" target=\"_blank\" rel=\"noreferrer noopener\">Research Paper<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/wbash1wF6efRj8G58\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/05\/29\/hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness-and-the-model-weights\/\">Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Most AI agents stop improving &hellip;<\/p>\n","protected":false},"author":1,"featured_media":996,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-995","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/995","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=995"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/995\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/996"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=995"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=995"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=995"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}