{"id":776,"date":"2026-04-23T11:46:24","date_gmt":"2026-04-23T03:46:24","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=776"},"modified":"2026-04-23T11:46:24","modified_gmt":"2026-04-23T03:46:24","slug":"xiaomi-releases-mimo-v2-5-pro-and-mimo-v2-5-matching-frontier-model-benchmarks-at-significantly-lower-token-cost","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=776","title":{"rendered":"Xiaomi Releases MiMo-V2.5-Pro and MiMo-V2.5: Matching Frontier Model Benchmarks at Significantly Lower Token Cost"},"content":{"rendered":"<p>Xiaomi MiMo team publicly released two new models: <strong>MiMo-V2.5-Pro<\/strong> and <strong>MiMo-V2.5<\/strong>. The benchmarks, combined with some genuinely striking real-world task demos, make a compelling case that open agentic AI is catching up to the frontier faster than most expected. Both models are available immediately via API, and priced competitively. <\/p>\n<h3 class=\"wp-block-heading\"><strong>What is an Agentic Model, and Why Does It Matter?<\/strong><\/h3>\n<p>Most LLM benchmarks test a model\u2019s ability to answer a single, self-contained question. Agentic benchmarks test something much harder \u2014 whether a model can complete a <em>multi-step goal<\/em> autonomously, using tools (web search, code execution, file I\/O, API calls) over many turns, without losing track of the original objective.<\/p>\n<p>Think of it as the difference between a model that can answer \u201chow do I write a lexer?\u201d versus one that can actually <em>write a complete compiler<\/em>, run tests against it, catch regressions, and fix them \u2014 all without a human in the loop. The latter is exactly what Xiaomi MiMo team is demonstrating here.<\/p>\n<h3 class=\"wp-block-heading\"><strong>MiMo-V2.5-Pro: The Flagship<\/strong><\/h3>\n<p>MiMo-V2.5-Pro is Xiaomi\u2019s most capable model to date, delivering significant improvements over its predecessor, MiMo-V2-Pro, in general agentic capabilities, complex software engineering, and long-horizon tasks.<\/p>\n<p>The key benchmark numbers are competitive with top closed-source models: SWE-bench Pro 57.2, Claw-Eval 63.8, and \u03c43-Bench 72.9 \u2014 placing it alongside Claude Opus 4.6 and GPT-5.4 across most evaluations. V2.5-Pro can sustain complex, long-horizon tasks spanning more than a thousand tool calls, demonstrating substantial improvements in instruction following within agentic scenarios, reliably adhering to subtle requirements embedded in context and maintaining strong coherence across ultra-long contexts.<\/p>\n<p>One behavioral property that distinguishes V2.5-Pro from earlier models is what Xiaomi MiMo team calls <strong>\u201charness awareness\u201d<\/strong>: it makes full use of the affordances of its harness environment, manages its memory, and shapes how its own context is populated toward the final objective. This means the model doesn\u2019t just execute instructions mechanically. It actively optimizes its own working environment to stay on track across very long tasks. <\/p>\n<p><strong>The three real-world task demos Xiaomi published illustrate exactly what \u201clong-horizon agentic capability\u201d means in practice.<\/strong><\/p>\n<p><strong>Demo 1 \u2014 SysY Compiler in Rust<\/strong>: Referred from Peking University\u2019s <a href=\"https:\/\/github.com\/pku-minic\" target=\"_blank\" rel=\"noreferrer noopener\">Compiler Principles<\/a> course project, this task asks the model to implement a complete SysY compiler in Rust from scratch: lexer, parser, AST, Koopa IR codegen, RISC-V assembly backend, and performance optimization. The reference project typically takes a PKU CS major student several weeks. MiMo-V2.5-Pro finished in 4.3 hours across 672 tool calls, scoring a perfect 233\/233 against the course\u2019s hidden test suite.<\/p>\n<p>What\u2019s notable isn\u2019t just the final score \u2014 it\u2019s the architecture of execution. Rather than thrashing through trial and error, the model built the compiler layer by layer: scaffold the full pipeline first, perfect Koopa IR (110\/110), then the RISC-V backend (103\/103), then performance (20\/20). The first compile alone passed 137\/233 tests, a 59% cold start that suggests the architecture was designed correctly before a single test was run. When a refactoring step later caused regressions, the model diagnosed the failures, recovered, and pushed on. This is structured, self-correcting engineering behavior \u2014 not pattern-matched code generation.<\/p>\n<p><strong>Demo 2 \u2014 Full-Featured Desktop Video Editor<\/strong>: With just a few simple prompts, MiMo-V2.5-Pro delivered a working desktop app: multi-track timeline, clip trimming, cross-fades, audio mixing, and export pipeline. The final build is 8,192 lines of code, produced over 1,868 tool calls across 11.5 hours of autonomous work.<\/p>\n<p><strong>Demo 3 \u2014 Analog EDA- FVF-LDO Design<\/strong>: This is the most technically specialized demo: a graduate-level analog-circuit EDA task requiring the design and optimization of a complete FVF-LDO (Flipped-Voltage-Follower low-dropout regulator) from scratch in the TSMC 180nm CMOS process. The model had to size the power transistor, tune the compensation network, and pick bias voltages so that six metrics land within spec simultaneously \u2014 phase margin, line regulation, load regulation, quiescent current, PSRR, and transient response. Wired into an ngspice simulation loop, in about an hour of closed-loop iteration \u2014 calling the simulator, reading waveforms, tweaking parameters \u2014 the model produced a design where every target metric is met, with four key metrics improved by an order of magnitude over its own initial attempt.<\/p>\n<p><strong>Token Efficiency<\/strong>: Intelligence at frontier level is only useful if it\u2019s cost-effective. On ClawEval, V2.5-Pro lands at 64% Pass^3 using only ~70K tokens per trajectory \u2014 roughly 40\u201360% fewer tokens than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 at comparable capability levels. For engineers building production agent pipelines, this is a material cost reduction, not just a marketing stat.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1468\" height=\"1000\" data-attachment-id=\"79232\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/22\/xiaomi-releases-mimo-v2-5-pro-and-mimo-v2-5-matching-frontier-model-benchmarks-at-significantly-lower-token-cost\/screenshot-2026-04-22-at-8-34-28-pm\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-22-at-8.34.28-PM.png\" data-orig-size=\"1468,1000\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-22 at 8.34.28\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-22-at-8.34.28-PM-1024x698.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-22-at-8.34.28-PM.png\" alt=\"\" class=\"wp-image-79232\" \/><figcaption class=\"wp-element-caption\">https:\/\/mimo.xiaomi.com\/mimo-v2-5-pro\/<\/figcaption><\/figure>\n<\/div>\n<p><strong>MiMo Coding Bench<\/strong> is Xiaomi\u2019s in-house evaluation suite designed to assess models on real-world developer tasks within agentic frameworks like Claude Code. It covers repo understanding, project building, code review, structured artifact generation, planning, SWE, and more. V2.5-Pro leads the field on this benchmark, and Xiaomi explicitly positions it as a drop-in backend for scaffolds including Claude Code, OpenCode, and Kilo.<\/p>\n<h3 class=\"wp-block-heading\"><strong>MiMo-V2.5: Native Omnimodal at Half the Cost<\/strong><\/h3>\n<p>While V2.5-Pro targets the hardest long-horizon agentic tasks, MiMo-V2.5 is a major step forward in agentic capability and multimodal understanding. With native visual and audio understanding, MiMo-V2.5 reasons seamlessly across modalities, surpasses MiMo-V2-Pro in agentic performance, and supports up to 1 million tokens of context.<\/p>\n<p>The model is designed with perception and action unified from scratch. MiMo-V2.5 is trained from the start to see, hear, and act on what it perceives, leading to a single model that understands everything and gets things done. This is architecturally significant \u2014 earlier multimodal models often bolted vision on top of a text backbone, creating capability gaps at the perception-action boundary.<\/p>\n<p>On the coding side, the value proposition is clear: in MiMo Coding Bench, MiMo-V2.5 delivers strong results on everyday coding tasks, closing the gap with frontier models and matching MiMo-V2.5-Pro at half the cost. For teams that don\u2019t need the extreme long-horizon depth of V2.5-Pro, this is a compelling operating point.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1472\" height=\"674\" data-attachment-id=\"79233\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/22\/xiaomi-releases-mimo-v2-5-pro-and-mimo-v2-5-matching-frontier-model-benchmarks-at-significantly-lower-token-cost\/screenshot-2026-04-22-at-8-37-35-pm\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-22-at-8.37.35-PM.png\" data-orig-size=\"1472,674\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-22 at 8.37.35\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-22-at-8.37.35-PM-1024x469.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-22-at-8.37.35-PM.png\" alt=\"\" class=\"wp-image-79233\" \/><figcaption class=\"wp-element-caption\">https:\/\/mimo.xiaomi.com\/mimo-v2-5\/<\/figcaption><\/figure>\n<\/div>\n<p>On multimodal benchmarks: MiMo-V2.5 achieves a 62.3 on the Claw-Eval general subset, placing it at the Pareto frontier of performance and efficiency. On the multimodal agentic subset, MiMo-V2.5 reaches 23.8 on Claw-Eval Multimodal, matching Claude Sonnet 4.6, leading MiMo-V2-Omni by eight points, and trailing Claude Opus 4.6 by a single point.<\/p>\n<p>On video understanding, MiMo-V2.5 scores 87.7 on Video-MME, effectively tied with Gemini 3 Pro (88.4) and well ahead of Gemini 3 Flash. Long-horizon video comprehension \u2014 scene tracking, temporal reasoning, visual grounding over minutes of footage \u2014 is now in frontier territory. On image understanding, MiMo-V2.5 lands at 81.0 on CharXiv RQ and 77.9 on MMMU-Pro, closing in on Gemini 3 Pro.<\/p>\n<p>Pricing is straightforward: MiMo-V2.5 runs at 1x (1 token = 1 credit), while MiMo-V2.5-Pro runs at 2x (1 token = 2 credits). Token Plans no longer charge a multiplier for the 1M-token context window \u2014 previously a common cost friction for long-context agentic workloads.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>MiMo-V2.5-Pro matches frontier closed-source models<\/strong> on key agentic benchmarks (SWE-bench Pro 57.2, Claw-Eval 63.8, \u03c43-Bench 72.9), while using 40\u201360% fewer tokens per trajectory than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4.<\/li>\n<li><strong>Long-horizon autonomy is real and measurable<\/strong> \u2014 V2.5-Pro autonomously built a complete SysY compiler in Rust (233\/233 tests, 672 tool calls, 4.3 hours) and a full-featured desktop video editor (8,192 lines of code, 1,868 tool calls, 11.5 hours).<\/li>\n<li><strong>MiMo-V2.5 is natively omnimodal<\/strong> \u2014 trained from scratch to see, hear, and act across modalities with a native 1M-token context window, matching Claude Sonnet 4.6 on Claw-Eval Multimodal and nearly tying Gemini 3 Pro on Video-MME (87.7 vs. 88.4).<\/li>\n<li><strong>Pro-level coding performance at half the cost<\/strong> \u2014 on MiMo Coding Bench, MiMo-V2.5 matches MiMo-V2.5-Pro on everyday coding tasks at 1x token pricing, making it the practical choice for most production agent pipelines.<\/li>\n<li><strong>Both models are <\/strong>already compatible with popular agentic scaffolds like Claude Code, OpenCode, and Kilo \u2014 giving AI devs a drop-in, auditable, self-hostable path to frontier-level agentic AI.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator aligncenter has-alpha-channel-opacity is-style-wide\" \/>\n<p>Check out\u00a0the<strong>\u00a0<a href=\"https:\/\/mimo.xiaomi.com\/mimo-v2-5\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details MiMo-V2.5<\/a><\/strong>, and <strong><a href=\"https:\/\/mimo.xiaomi.com\/mimo-v2-5-pro\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details MiMo-V2.5-Pro<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">130k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/22\/xiaomi-releases-mimo-v2-5-pro-and-mimo-v2-5-matching-frontier-model-benchmarks-at-significantly-lower-token-cost\/\">Xiaomi Releases MiMo-V2.5-Pro and MiMo-V2.5: Matching Frontier Model Benchmarks at Significantly Lower Token Cost<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Xiaomi MiMo team publicly rele&hellip;<\/p>\n","protected":false},"author":1,"featured_media":777,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-776","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/776","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=776"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/776\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/777"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=776"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=776"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=776"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}