{"id":694,"date":"2026-04-10T07:06:29","date_gmt":"2026-04-09T23:06:29","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=694"},"modified":"2026-04-10T07:06:29","modified_gmt":"2026-04-09T23:06:29","slug":"meta-superintelligence-lab-releases-muse-spark-a-multimodal-reasoning-model-with-thought-compression-and-parallel-agents","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=694","title":{"rendered":"Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents"},"content":{"rendered":"<p>Meta Superintelligence Labs recently made a significant move by unveiling \u2018Muse Spark\u2019 \u2014 the first model in the Muse family. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. <\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1146\" height=\"486\" data-attachment-id=\"78891\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/meta-superintelligence-lab-releases-muse-spark-a-multimodal-reasoning-model-with-thought-compression-and-parallel-agents\/screenshot-2026-04-09-at-4-03-45-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-09-at-4.03.45-PM-1.png\" data-orig-size=\"1146,486\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-09 at 4.03.45\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-09-at-4.03.45-PM-1-300x127.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-09-at-4.03.45-PM-1-1024x434.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-09-at-4.03.45-PM-1.png\" alt=\"\" class=\"wp-image-78891\" \/><figcaption class=\"wp-element-caption\">https:\/\/ai.meta.com\/static-resource\/muse-spark-eval-methodology<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>What \u2018Natively Multimodal\u2019 Actually Means<\/strong><\/h3>\n<p>When Meta describes Muse Spark as \u2018natively multimodal,\u2019 it means the model was trained from the ground up to process and reason across text and visual inputs simultaneously \u2014 not a vision module bolted onto a language model after the fact. Muse Spark is built from the ground up to integrate visual information across domains and tools, achieving strong performance on visual STEM questions, entity recognition, and localization.<\/p>\n<p>This architectural choice has real consequences on tasks that combine language and vision. On the ScreenSpot Pro benchmark \u2014 which tests screenshot localization, requiring the model to identify specific UI elements in images \u2014 Muse Spark scores 72.2 (84.1 with Python tools), compared to Claude Opus 4.6 Max\u2019s 57.7 (83.1 with Python) and GPT-5.4 Xhigh\u2019s 39.0 (85.4 with Python). <\/p>\n<h3 class=\"wp-block-heading\"><strong>Three Scaling Axes: Pretraining, RL, and Test-Time Reasoning<\/strong><\/h3>\n<p>The most technically interesting part of the Muse Spark announcement is Meta\u2019s explicit framing around <strong>three scaling axes<\/strong> \u2014 the levers they\u2019re pulling to improve model capability in a predictable and measurable way. To support further scaling across all three, Meta is making strategic investments across the entire stack \u2014 from research and model training to infrastructure, including the Hyperion data center.<\/p>\n<p><strong>Pretraining<\/strong> is where the model learns its core world knowledge, reasoning, and coding abilities. Over the last nine months, Meta rebuilt its pretraining stack with improvements to model architecture, optimization, and data curation. The payoff is substantial efficiency gains: Meta can reach the same capabilities with over an order of magnitude less compute than its previous model, Llama 4 Maverick. For devs, \u2018an order of magnitude\u2019 means roughly 10x more compute-efficient \u2014 a major improvement that makes larger future models more financially and practically viable.<\/p>\n<p><strong>Reinforcement Learning (RL)<\/strong> is the second axis. After pretraining, RL is applied to amplify capabilities by training the model on outcome-based feedback rather than just token prediction. Think of it this way: pretraining teaches the model facts and patterns; RL teaches it to actually get answers right. Even though large-scale RL is notoriously prone to instability, Meta\u2019s new stack delivers smooth, predictable gains. The research team reports log-linear growth in pass@1 and pass@16 on training data, that means the model improves consistently as RL compute scales. pass@1 means the model gets the answer right on its first try; pass@16 means at least one success across 16 attempts \u2014 a measure of reasoning diversity.<\/p>\n<p><strong>Test-Time Reasoning<\/strong> is the third axis. This refers to the compute the model uses at inference time \u2014 the period when it\u2019s actually generating an answer for a user. Muse Spark is trained to \u2018think\u2019 before it responds, a process Meta\u2019s research team calls test-time reasoning. To deliver the most intelligence per token, RL training maximizes correctness subject to a penalty on thinking time. This produces a phenomenon the research team calls <em>thought compression<\/em>: after an initial period where the model improves by thinking longer, the length penalty causes thought compression \u2014 Muse Spark compresses its reasoning to solve problems using significantly fewer tokens. After compressing, the model then extends its solutions again to achieve stronger performance.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1300\" height=\"1446\" data-attachment-id=\"78889\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/meta-superintelligence-lab-releases-muse-spark-a-multimodal-reasoning-model-with-thought-compression-and-parallel-agents\/screenshot-2026-04-09-at-3-58-22-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-09-at-3.58.22-PM-1.png\" data-orig-size=\"1300,1446\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-09 at 3.58.22\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-09-at-3.58.22-PM-1-270x300.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-09-at-3.58.22-PM-1-921x1024.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-09-at-3.58.22-PM-1.png\" alt=\"\" class=\"wp-image-78889\" \/><figcaption class=\"wp-element-caption\">https:\/\/ai.meta.com\/static-resource\/muse-spark-eval-methodology<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Contemplating Mode: Multi-Agent Orchestration at Inference<\/strong><\/h3>\n<p>Perhaps the most architecturally interesting feature is Contemplating mode. The research team describes it as a novel multi-round test-time scaling scaffold covering solution generation, iterative self-refinement, and aggregation. In plain terms: instead of one model generating one answer, multiple agents run in parallel, each producing solutions that are then refined and aggregated into a final output.<\/p>\n<p>While standard test-time scaling has a single agent think for longer, scaling Muse Spark with multi-agent thinking enables superior performance with comparable latency. This is a key engineering trade-off: latency scales with the depth of a single chain of thought, but parallel agents can add capability without proportionally adding wait time.<\/p>\n<p>In Contemplating mode, Muse Spark scores 58.4 on Humanity\u2019s Last Exam With Tools \u2014 a benchmark designed to test expert-level multidisciplinary knowledge \u2014 compared to Gemini 3.1 Deep Think\u2019s 53.4 and GPT-5.4 Pro\u2019s 58.7. On FrontierScience Research, Muse Spark Contemplating reaches 38.3, ahead of GPT-5.4 Pro\u2019s 36.7 and Gemini 3.1 Deep Think\u2019s 23.3. <\/p>\n<h3 class=\"wp-block-heading\"><strong>Where Muse Spark Leads \u2014 and Where It Trails<\/strong><\/h3>\n<p>On health benchmarks, Muse Spark posts its most decisive results. On HealthBench Hard \u2014 a subset of 1,000 open-ended health queries \u2014 Muse Spark scores 42.8, compared to Claude Opus 4.6 Max\u2019s 14.8, Gemini 3.1 Pro High\u2019s 20.6, and GPT-5.4 Xhigh\u2019s 40.1. This is not just luck: to improve Muse Spark\u2019s health reasoning capabilities, Meta\u2019s research team collaborated with over 1,000 physicians to curate training data that enables more factual and comprehensive responses.<\/p>\n<p>On coding benchmarks, the picture is more competitive. On SWE-Bench Verified, where models must resolve real GitHub issues using a bash tool and file operation tool in a single-attempt setup averaged over 15 attempts per problem, Muse Spark scores 77.4 \u2014 behind Claude Opus 4.6 Max at 80.8 and Gemini 3.1 Pro High at 80.6. On GPQA Diamond, a PhD-level reasoning benchmark averaged over 4 runs to reduce variance, Muse Spark scores 89.5, behind Claude Opus 4.6 Max\u2019s 92.7 and Gemini 3.1 Pro High\u2019s 94.3.<\/p>\n<p>The sharpest gap appears on ARC AGI 2, the abstract reasoning puzzles benchmark run on a public set of 120 prompts reported at pass@2. Muse Spark scores 42.5 \u2014 meaningfully behind Gemini 3.1 Pro High at 76.5 and GPT-5.4 Xhigh at 76.1. This is the clearest current weak spot in Muse Spark\u2019s profile.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Meta\u2019s fresh start, not an iteration<\/strong>: Muse Spark is the first model from the newly formed Meta Superintelligence Labs \u2014 built on a completely rebuilt pretraining stack that is over 10x more compute-efficient than Llama 4 Maverick, signaling a deliberate ground-up reset of Meta\u2019s AI strategy.<\/li>\n<li><strong>Health is the headline benchmark win<\/strong>: Muse Spark\u2019s most decisive advantage over competitors is in health reasoning \u2014 scoring 42.8 on HealthBench Hard versus Claude Opus 4.6 Max\u2019s 14.8 and Gemini 3.1 Pro High\u2019s 20.6, backed by training data curated with over 1,000 physicians.<\/li>\n<li><strong>Contemplating mode trades parallel compute for lower latency<\/strong>: Instead of making a single model think longer \u2014 which increases response time \u2014 Muse Spark\u2019s Contemplating mode runs multiple agents in parallel that refine and aggregate answers, achieving competitive performance on hard reasoning tasks without proportionally higher latency.<\/li>\n<li><strong>Abstract reasoning is the clearest weak spot.<\/strong> On ARC AGI 2, Muse Spark scores 42.5 against Gemini 3.1 Pro High\u2019s 76.5 and GPT-5.4 Xhigh\u2019s 76.1 \u2014 the largest performance gap in the entire benchmark table.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the\u00a0<strong><a href=\"https:\/\/ai.meta.com\/blog\/introducing-muse-spark-msl\/?\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a> <\/strong>and<strong> <a href=\"https:\/\/ai.meta.com\/static-resource\/muse-spark-eval-methodology\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/09\/meta-superintelligence-lab-releases-muse-spark-a-multimodal-reasoning-model-with-thought-compression-and-parallel-agents\/\">Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Meta Superintelligence Labs re&hellip;<\/p>\n","protected":false},"author":1,"featured_media":695,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-694","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/694","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=694"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/694\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/695"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=694"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=694"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=694"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}