{"id":703,"date":"2026-04-12T17:20:15","date_gmt":"2026-04-12T09:20:15","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=703"},"modified":"2026-04-12T17:20:15","modified_gmt":"2026-04-12T09:20:15","slug":"minimax-just-open-sourced-minimax-m2-7-a-self-evolving-agent-model-that-scores-56-22-on-swe-pro-and-57-0-on-terminal-bench-2","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=703","title":{"rendered":"MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2"},"content":{"rendered":"<p>MiniMax has officially open-sourced MiniMax M2.7, making the model weights publicly available on Hugging Face. Originally announced on March 18, 2026, MiniMax M2.7 is the MiniMax\u2019s most capable open-source model to date \u2014 and its first model to actively participate in its own development cycle, a meaningful shift in how large language models are built and iterated.<\/p>\n<h3 class=\"wp-block-heading\"><strong>What is MiniMax M2.7?<\/strong><\/h3>\n<p>MiniMax M2.7 is part of MiniMax\u2019s M2-series of Mixture-of-Experts (MoE) models. MoE is an architectural design where only a subset of the total parameters are \u2018activated\u2019 during any inference pass, which makes the model significantly faster and cheaper to serve compared to a dense model of similar output quality. <\/p>\n<p>MiniMax M2.7 is built around three core capability areas: professional software engineering, professional office work, and what MiniMax calls Agent Teams \u2014 native multi-agent collaboration. MiniMax M2.7 is capable of building complex agent harnesses and completing highly elaborate productivity tasks, leveraging capabilities such as Agent Teams, complex Skills, and dynamic tool search.<\/p>\n<h3 class=\"wp-block-heading\"><strong>SOTA Benchmark Performance: SWE-Pro and Terminal Bench 2<\/strong><\/h3>\n<p>On SWE-Pro, which covers multiple programming languages, MiniMax M2.7 achieved a 56.22% accuracy rate, matching GPT-5.3-Codex. SWE-Pro tasks span log analysis, bug troubleshooting, code security review, and machine learning workflow debugging \u2014 much closer to the messy reality of production systems than standard algorithmic coding tests.<\/p>\n<p>On Terminal Bench 2 (57.0%) and NL2Repo (39.8%), both of which demand a high degree of system-level comprehension, MiniMax M2.7 performs solidly. The model excels not only at code generation but can also deeply understand the operational logic and collaborative dynamics of software systems.<\/p>\n<p>On the repo-level code generation benchmark VIBE-Pro, MiniMax M2.7 scored 55.6%, nearly on par with Opus 4.6 \u2014 meaning whether the requirement involves Web, Android, iOS, or simulation tasks, they can be handed directly to MiniMax M2.7 to complete. It also demonstrates a strong advantage on benchmarks closer to real-world engineering scenarios: SWE Multilingual (76.5) and Multi SWE Bench (52.7).<\/p>\n<h3 class=\"wp-block-heading\"><strong>Production Debugging: Under Three Minutes<\/strong><\/h3>\n<p>When faced with alerts in production, MiniMax M2.7 can correlate monitoring metrics with deployment timelines to perform causal reasoning, conduct statistical analysis on trace sampling and propose precise hypotheses, proactively connect to databases to verify root causes, pinpoint missing index migration files in the code repository, and use non-blocking index creation to stop the bleeding before submitting a merge request. MiniMax team reports that on multiple occasions, this reduced recovery time for live production system incidents to under three minutes. From observability analysis and database expertise to SRE-level decision-making, this positions MiniMax M2.7 as something beyond a code-generation model.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Self-Evolution Architecture<\/strong><\/h3>\n<p>To test the boundaries of autonomous improvement, MiniMax M2.7 was tasked with optimizing a model\u2019s programming performance on an internal scaffold. It ran entirely autonomously, executing an iterative loop of \u2018analyze failure trajectories \u2192 plan changes \u2192 modify scaffold code \u2192 run evaluations \u2192 compare results \u2192 decide to keep or revert changes\u2019 for over 100 rounds. During this process, MiniMax M2.7 discovered effective optimizations on its own: systematically searching for the optimal combination of sampling parameters such as temperature, frequency penalty, and presence penalty; designing more specific workflow guidelines (such as automatically searching for the same bug pattern in other files after a fix); and adding loop detection to the scaffold\u2019s agent loop. This achieved a 30% performance improvement on internal evaluation sets.<\/p>\n<p>Within MiniMax\u2019s own reinforcement learning team workflows, M2.7 is now capable of handling 30%\u201350% of the workflow end-to-end, with human researchers only interacting for critical decisions and discussions.<\/p>\n<h3 class=\"wp-block-heading\"><strong>MLE Bench Lite: Testing Autonomous ML Experimentation<\/strong><\/h3>\n<p>MiniMax team also tested MiniMax M2.7 on MLE Bench Lite, OpenAI\u2019s open-sourced suite of 22 machine learning competitions runnable on a single A30 GPU, covering virtually all stages of the ML workflow.<\/p>\n<p>For this evaluation, MiniMax team designed a simple three-component harness: short-term memory, self-feedback, and self-optimization. After each iteration round, the agent generates a short-term memory markdown file, performs self-criticism on the current results, and provides optimization directions for the next round. Three trials were run, each with a 24-hour window for iterative evolution.<\/p>\n<p>The best run achieved 9 gold medals, 5 silver medals, and 1 bronze medal. The average medal rate across the three runs was 66.6%, a result second only to Opus-4.6 (75.7%) and GPT-5.4 (71.2%), tying with Gemini-3.1 (66.6%).<\/p>\n<h3 class=\"wp-block-heading\"><strong>Professional Office Work and Finance<\/strong><\/h3>\n<p>Beyond software engineering, MiniMax M2.7 targets professional office tasks. In the GDPval-AA evaluation, which measures domain expertise and task delivery capability across 45 models, MiniMax M2.7 achieved an ELO score of 1495 \u2014 the highest among open-source models, second only to Opus 4.6, Sonnet 4.6, and GPT-5.4, and surpassing GPT-5.3.<\/p>\n<p>On Toolathon, MiniMax M2.7 achieved an accuracy of 46.3%, reaching the global top tier. In MM Claw testing \u2014 an evaluation MiniMax built based on real-world usage patterns from the OpenClaw personal agent platform \u2014 MiniMax M2.7 maintained a 97% skill compliance rate across 40 complex skills (each exceeding 2,000 tokens) and achieved an overall accuracy of 62.7%, approaching Sonnet 4.6.<\/p>\n<p>In finance, MiniMax M2.7 can autonomously read a company\u2019s annual reports and earnings call transcripts, cross-reference multiple research reports, independently design assumptions and build a revenue forecast model, and produce a PPT and Word research report based on templates \u2014 understanding, making judgments, and producing output like a junior analyst. <\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>MiniMax M2.7 is now officially open source<\/strong>, with weights available on Hugging Face, making a frontier-grade agentic model freely accessible for developers to deploy and build on.<\/li>\n<li><strong>MiniMax<\/strong> <strong>M2.7 achieves SOTA performance on real-world software engineering benchmarks<\/strong>, scoring 56.22% on SWE-Pro (matching GPT-5.3-Codex) and 57.0% on Terminal Bench 2 \u2014 tests that measure production-level reasoning, not just code generation.<\/li>\n<li><strong>MiniMax<\/strong> <strong>M2.7 is the first model to actively participate in its own development<\/strong>, running over 100 autonomous rounds of scaffold optimization and achieving a 30% performance improvement \u2014 an early, concrete example of AI-assisted AI development in practice.<\/li>\n<li><strong>The model is built for real agentic deployments<\/strong>, maintaining 97% skill adherence across 40 complex skills (each exceeding 2,000 tokens), supporting native Agent Teams with stable role boundaries, and handling 30\u201350% of MiniMax\u2019s internal RL team workflows autonomously.<\/li>\n<li><strong>MiniMax<\/strong> <strong>M2.7 is the highest-ranked open-source model on GDPval-AA<\/strong> with an ELO score of 1495 across 45 models, demonstrating strong professional work capabilities spanning office document editing, financial analysis, and multi-round high-fidelity task delivery.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the<strong><a href=\"https:\/\/arxiv.org\/pdf\/2604.04921\" target=\"_blank\" rel=\"noreferrer noopener\">\u00a0<\/a><a href=\"https:\/\/www.minimax.io\/news\/minimax-m27-en\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><\/strong>\u00a0and\u00a0<strong><a href=\"https:\/\/huggingface.co\/MiniMaxAI\/MiniMax-M2.7\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weight<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">130k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/12\/minimax-just-open-sourced-minimax-m2-7-a-self-evolving-agent-model-that-scores-56-22-on-swe-pro-and-57-0-on-terminal-bench-2\/\">MiniMax Just Open Sourced MiniMax M2.7: A Self-Evolving Agent Model that Scores 56.22% on SWE-Pro and 57.0% on Terminal Bench 2<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>MiniMax has officially open-so&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-703","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/703","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=703"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/703\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=703"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=703"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=703"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}