{"id":813,"date":"2026-04-29T13:45:42","date_gmt":"2026-04-29T05:45:42","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=813"},"modified":"2026-04-29T13:45:42","modified_gmt":"2026-04-29T05:45:42","slug":"poolside-ai-introduces-laguna-xs-2-and-m-1-agentic-coding-models-reaching-68-2-and-72-5-on-swe-bench-verified","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=813","title":{"rendered":"Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified"},"content":{"rendered":"<p>Poolside AI released the first two models in its Laguna family: <strong>Laguna M.1<\/strong> and <strong>Laguna XS.2<\/strong>. Alongside these, the company is releasing <strong>pool<\/strong> \u2014 a lightweight terminal-based coding agent and a dual Agent Client Protocol (ACP) client-server \u2014 the same environment Poolside uses internally for agent RL training and evaluation, now available as a research preview. <\/p>\n<h3 class=\"wp-block-heading\"><strong>What are These Models, and Why Should You Care?<\/strong><\/h3>\n<p>Both Laguna M.1 and Laguna XS.2 are <strong>Mixture-of-Experts (MoE)<\/strong> models. Instead of activating all parameters for every token, MoE models route each token through only a subset of specialized sub-networks called \u2018experts.\u2019 This means a large total parameter count and the capability headroom that comes with it while only paying the compute cost of a much smaller \u201cactivated\u201d parameter count at inference time.<\/p>\n<p><strong>Laguna M.1<\/strong> is a 225B total parameter MoE model with 23B activated parameters, trained from scratch on 30T tokens using 6,144 interconnected NVIDIA Hopper GPUs. It completed pre-training at the end of last year and serves as the foundation for the entire Laguna family. On benchmarks, it reaches <strong>72.5% on SWE-bench Verified<\/strong>, <strong>67.3% on SWE-bench Multilingual<\/strong>, <strong>46.9% on SWE-bench Pro<\/strong>, and <strong>40.7% on Terminal-Bench 2.0<\/strong>.<\/p>\n<p><strong>Laguna XS.2<\/strong> is the second-generation MoE and Poolside\u2019s first open-weight model, built on everything learned since training M.1. At 33B total parameters with 3B activated per token, it is designed for agentic coding and long-horizon work on a local machine \u2014 compact enough to run on a Mac with 36 GB of RAM via Ollama. It scores <strong>68.2% on SWE-bench Verified<\/strong>, <strong>62.4% on SWE-bench Multilingual<\/strong>, <strong>44.5% on SWE-bench Pro<\/strong>, and <strong>30.1% on Terminal-Bench 2.0<\/strong>. Poolside will also release <strong>Laguna XS.2-base<\/strong> soon for practitioners who want to fine-tune.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Architecture: The Efficiency Decisions in XS.2<\/strong><\/h3>\n<p>XS.2 uses <strong>sigmoid gating with per-layer rotary scales<\/strong>, enabling a mixed Sliding Window Attention (SWA) and global attention layout in a 3:1 ratio across 40 total layers \u2014 30 SWA layers and 10 global attention layers. Sliding Window Attention limits each token\u2019s attention to a local window of 512 tokens rather than the full sequence, dramatically cutting KV cache memory. The global attention layers at a 1-in-4 ratio preserve long-range dependencies without paying the full cost everywhere. The model also quantizes the KV cache to <strong>FP8<\/strong>, further reducing memory per token.<\/p>\n<p>Under the hood, XS.2 uses <strong>256 experts with 1 shared expert<\/strong>, supports a <strong>context window of 131,072 tokens<\/strong>, and features native reasoning support \u2014 interleaved thinking between tool calls with per-request control over enabling or disabling thinking.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1612\" height=\"920\" data-attachment-id=\"79373\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/28\/poolside-ai-introduces-laguna-xs-2-and-m-1-agentic-coding-models-reaching-68-2-and-72-5-on-swe-bench-verified\/screenshot-2026-04-28-at-10-43-19-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-28-at-10.43.19-PM-1.png\" data-orig-size=\"1612,920\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-28 at 10.43.19\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-28-at-10.43.19-PM-1-1024x584.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-28-at-10.43.19-PM-1.png\" alt=\"\" class=\"wp-image-79373\" \/><figcaption class=\"wp-element-caption\">https:\/\/poolside.ai\/blog\/laguna-a-deeper-dive<\/figcaption><\/figure>\n<\/div>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1622\" height=\"926\" data-attachment-id=\"79375\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/28\/poolside-ai-introduces-laguna-xs-2-and-m-1-agentic-coding-models-reaching-68-2-and-72-5-on-swe-bench-verified\/screenshot-2026-04-28-at-10-43-37-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-28-at-10.43.37-PM-1.png\" data-orig-size=\"1622,926\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-28 at 10.43.37\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-28-at-10.43.37-PM-1-1024x585.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-28-at-10.43.37-PM-1.png\" alt=\"\" class=\"wp-image-79375\" \/><figcaption class=\"wp-element-caption\">https:\/\/poolside.ai\/blog\/laguna-a-deeper-dive<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Training: Three Areas Poolside Pushed Hard<\/strong><\/h3>\n<p>Poolside team trains all its models from scratch using its own data pipeline, its own training codebase (Titan), and its own agent RL infrastructure. Three areas saw particular investment for Laguna.<\/p>\n<p><strong>AutoMixer: Optimizing the Data Mix Automatically.<\/strong> Data curation and the mix that goes into training is extremely impactful on final model performance. Rather than relying on manual heuristics, Poolside developed an automixing framework that trains a swarm of approximately 60 proxy models, each on a different data mix, and measures performance across key capability groups \u2014 code, math, STEM, and common sense. Surrogate regressors are then fit to approximate how changes in dataset proportions affect downstream evaluations, giving a learned mapping from data mix to performance that can be directly optimized. The approach is inspired by prior work including <strong>Olmix, MDE, and RegMix<\/strong>, adapted to Poolside\u2019s setting with richer data groupings.<\/p>\n<p>On the data side, both Laguna models were trained on more than 30T tokens. Poolside\u2019s diversity-preserving data curation approach \u2014 which retains portions of mid- and lower-quality buckets alongside top-quality data to avoid STEM bias \u2014 yields approximately <strong>2\u00d7 more unique tokens<\/strong> compared to precision-focused pipelines, with the gain persisting at longer training horizons. A separate deduplication analysis also confirmed that <strong>global deduplication disproportionately removes high-quality data<\/strong>, informing how the team tuned its pipeline. Synthetic data contributes about <strong>13% of the final training mix<\/strong> in Laguna XS.2, with the Laguna series using approximately <strong>4.4T+ synthetic tokens<\/strong> in total.<\/p>\n<p><strong>Muon Optimizer.<\/strong> Rather than AdamW \u2014 the most common optimizer in large model training \u2014 Poolside used a distributed implementation of the <strong>Muon optimizer<\/strong> through all training stages of both models. In initial pre-training ablations, the research team achieved the same training loss as an AdamW baseline in approximately <strong>15% fewer steps<\/strong>, with large absolute evaluation uplifts on the final model, and achieved learning rate transfer across model scales. An additional benefit: Muon requires only one state per parameter rather than two, reducing memory requirements for both training and checkpointing. During pre-training of Laguna M.1, the overhead from the optimizer was less than 1% of the training step time.<\/p>\n<p>Poolside also runs <strong>periodic hash checks on model weights<\/strong> across training replicas to catch silent data corruption (SDC) from defective GPUs \u2014 specifically errors in arithmetic logic and pipeline registers, which unlike DRAM and SRAM are not covered by ECC protection.<\/p>\n<p><strong>Async On-Policy Agent RL.<\/strong> This is arguably the most complex piece of the Laguna training stack. Poolside built a fully asynchronous online RL system where actor processes pull tasks from a dataset, spin up sandboxed containers, and run the production agent binary against each task using the freshly deployed model. The resulting trajectories are scored, filtered, and written to <strong>Iceberg tables<\/strong>, while the trainer continuously consumes those records and produces the next checkpoint \u2014 inference and training running asynchronously in parallel, with throughput tuned to balance off-policy staleness.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Poolside releases its first open-weight model:<\/strong> Laguna XS.2 is a 33B total parameter MoE model with only 3B activated parameters per token, available under an Apache 2.0 license \u2014 compact enough to run locally on a Mac with 36 GB of RAM via Ollama.<\/li>\n<li><strong>Strong benchmark performance at small scale:<\/strong> Laguna XS.2 scores 68.2% on SWE-bench Verified and 44.5% on SWE-bench Pro, while the larger Laguna M.1 (225B total, 23B activated) reaches 72.5% on SWE-bench Verified and 46.9% on SWE-bench Pro \u2014 both trained from scratch on 30T tokens.<\/li>\n<li><strong>Muon optimizer beats AdamW by 15% in training efficiency:<\/strong> Poolside replaced AdamW with a distributed implementation of the Muon optimizer, achieving the same training loss in roughly 15% fewer steps, with lower memory requirements \u2014 only one state per parameter instead of two.<\/li>\n<li><strong>AutoMixer replaces manual data mixing with learned optimization:<\/strong> Instead of handcrafted data recipes, Poolside trains a swarm of ~60 proxy models on different data mixes and fits surrogate regressors to optimize dataset proportions \u2014 with synthetic data making up ~13% of Laguna XS.2\u2019s final training mix from a total of 4.4T+ synthetic tokens.<\/li>\n<li><strong>Fully asynchronous agent RL with GPUDirect RDMA weight transfer:<\/strong> Poolside\u2019s RL system runs inference and training in parallel, transferring hundreds of gigabytes of BF16 weights between nodes in under 5 seconds via GPUDirect RDMA, using a token-in, token-out actor design and the CISPO algorithm for off-policy training stability.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator aligncenter has-alpha-channel-opacity is-style-wide\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/huggingface.co\/poolside\/Laguna-XS.2\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weights<\/a> <\/strong>and<strong> <a href=\"https:\/\/poolside.ai\/blog\/laguna-a-deeper-dive\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">130k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/28\/poolside-ai-introduces-laguna-xs-2-and-m-1-agentic-coding-models-reaching-68-2-and-72-5-on-swe-bench-verified\/\">Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Poolside AI released the first&hellip;<\/p>\n","protected":false},"author":1,"featured_media":814,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-813","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/813","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=813"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/813\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/814"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=813"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=813"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=813"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}