{"id":1022,"date":"2026-06-02T04:40:05","date_gmt":"2026-06-01T20:40:05","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=1022"},"modified":"2026-06-02T04:40:05","modified_gmt":"2026-06-01T20:40:05","slug":"minimax-releases-minimax-m3-with-msa-architecture-supporting-1m-token-context-native-multimodality-and-agentic-coding","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=1022","title":{"rendered":"MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding"},"content":{"rendered":"<p class=\"wp-block-paragraph\"><strong>MiniMax officially released MiniMax<\/strong> <strong>M3 on June 1, 2026. The model introduces MSA (MiniMax Sparse Attention), a new sparse attention architecture that gives M3 a 1M-token context window. M3 also supports image and video input and desktop computer operation natively. The API is live now.<\/strong><\/p>\n<p class=\"wp-block-paragraph\">MiniMax M3 is available today via MiniMax Code, the MiniMax Token Plan, and the MiniMax API. It is the next model in the M-series line after M2.7. MiniMax positions M3 as an open-weight model combining frontier-level coding performance, a 1M-token context window, and native multimodal input in a single architecture \u2014 the first to do so, per MiniMax. The corresponding model weights and technical report are scheduled for release within 10 days of launch.<\/p>\n<h2 class=\"wp-block-heading\"><strong>MSA: MiniMax Sparse Attention<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">The central architectural change in MiniMax M3 is MSA (MiniMax Sparse Attention). Standard full attention has quadratic computational complexity: as context length grows, compute cost grows as the square of the sequence length. MSA is designed to address this.<\/p>\n<p class=\"wp-block-paragraph\">Sparse attention mechanisms generally add a pre-filtering stage before computing attention, avoiding full quadratic cost. MiniMax team states that compared to approaches like DSA and MoBA, MSA partitions the KV cache into blocks more precisely, achieving higher effective context coverage.<\/p>\n<p class=\"wp-block-paragraph\">At the operator level, MSA uses a \u201cKV outer gather Q\u201d approach. KV blocks serve as the outer loop to aggregate the queries that hit them. Each block is read only once and memory access is contiguous. MiniMax team reports this is more than 4\u00d7 faster than open-source implementations such as Flash-Sparse-Attention and flash-moba under MiniMax M3\u2019s head configuration.<\/p>\n<p class=\"wp-block-paragraph\">The result: at a context length of 1 million tokens, MiniMax M3\u2019s per-token compute is 1\/20th that of the previous-generation M2 models. MiniMax team reports a speedup of more than 9\u00d7 in the prefill stage and more than 15\u00d7 in the decoding stage at 1M-token context. Across multiple ablation studies, MSA matched full attention on the majority of capabilities.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Coding and Agentic Benchmarks<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Coding and agentic capabilities are key areas of improvement for M3. The benchmark results below are reported by MiniMax team. Several evaluations were run on MiniMax internal infrastructure, while some comparison scores were taken from official leaderboards or external benchmark sources, as noted in MiniMax\u2019s methodology. SWE-Bench Verified was tested on internal infrastructure using Claude Code scaffolding and averaged over 4 runs. SWE-Bench Pro was also tested on internal infrastructure using Claude Code scaffolding, with testing logic aligned to the official evaluation.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>SWE-Bench Pro<\/strong>: 59.0% (surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7)<\/li>\n<li><strong>Terminal-Bench 2.1<\/strong>: 66.0%<\/li>\n<li><strong>SWE-fficiency<\/strong>: 34.8%<\/li>\n<li><strong>KernelBench Hard<\/strong>: 28.8% (evaluated on NVIDIA Blackwell GPUs, CUDA capability sm_120)<\/li>\n<li><strong>MCP Atlas<\/strong>: 74.2%<\/li>\n<li><strong>Claw-Eval<\/strong>: highest score among models evaluated (General Task Group, 161 tasks)<\/li>\n<li><strong>SVG-Bench<\/strong>: surpasses Opus 4.7<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">On <strong>OmniDocBench<\/strong>, a multimodal document understanding benchmark, M3 scores above Gemini 3.1 Pro. On <strong>OSWorld-Verified<\/strong> (361 samples), M3 achieves a 70.06% task completion rate for computer use (Max Steps = 200).<\/p>\n<p class=\"wp-block-paragraph\">MiniMax also built an interactive user simulator framework for training and evaluation. It simulates multi-turn developer collaboration: requirement elaboration, solution discussion, feedback-based correction, continuous task switching, and multi-round project iteration. This is intended to reduce the gap between single-turn benchmark performance and real-world, multi-turn developer workflows.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Native Multimodality<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">MiniMax M3 underwent mixed-modality training from step 0. Text, images, and video are trained together from the beginning rather than added post-training. MiniMax team reports that interleaved data \u2014 sequences where text and images are naturally intermixed \u2014 is more critical to model performance than commonly assumed. After rebuilding the entire data pipeline for interleaved formats, training data was scaled to the order of 100 trillion tokens.<\/p>\n<p class=\"wp-block-paragraph\">MiniMax M3 supports image and video input and can operate a desktop computer.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Real-World Task Examples from MiniMax<\/strong><\/h2>\n<p class=\"wp-block-paragraph\"><strong>MiniMax documents three internal tasks in the release post<\/strong>:<\/p>\n<p class=\"wp-block-paragraph\"><strong>Paper reproduction<\/strong>: MiniMax gave MiniMax M3 the ICLR 2025 Outstanding Paper Award-winning paper <em>Learning Dynamics of LLM Finetuning<\/em> and asked it to reproduce the experiments independently. M3 ran autonomously for nearly 12 hours, produced 18 commits and 23 experimental figures, and completed the core experiments without human intervention. It required multimodal capability to read curves and formulas, long context to hold the paper and experiment logs simultaneously, and coding capability to execute the reproduction across a long thread.<\/p>\n<p class=\"wp-block-paragraph\"><strong>CUDA kernel optimization<\/strong>: MiniMax asked MiniMax M3 to optimize an FP8 matrix multiplication (GEMM) kernel on NVIDIA Hopper architecture GPUs. The model started with only a task description, a benchmark evaluation script, and a non-functional Triton skeleton \u2014 no reference implementation was provided. Over approximately 24 hours, MiniMax M3 made 147 benchmark submissions and 1,959 tool calls. It progressed through baseline implementation, autotune configuration generation, performance bottleneck diagnosis, CUDA Graph integration, persistent kernel rewriting, and host-side scheduling optimization. After six landmark rounds of optimization, MiniMax M3 improved Hopper FP8 hardware peak utilization from 7.6% to 71.3%, a 9.4\u00d7 speedup. The best solution appeared on the 145th submission. MiniMax notes that most other models stopped making new progress within the first 30 submissions; only Opus 4.7 and M3 continued beyond that point.<\/p>\n<p class=\"wp-block-paragraph\"><strong>PostTrainBench (autonomous model training)<\/strong>: MiniMax gave MiniMax M3 four base models that had completed pretraining only. MiniMax M3 autonomously ran the full data synthesis \u2192 training \u2192 evaluation \u2192 iteration cycle over 12 hours with no human intervention. The target was for the base models to acquire capabilities across mathematical reasoning (AIME2025), tool calling (BFCL), scientific knowledge reasoning (GPQA Main), arithmetic reasoning (GSM8K), and code generation (HumanEval). MiniMax M3 scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of the other models tested.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Marktechpost\u2019s Visual Explainer<\/strong><\/h2>\n<div>\n<div class=\"mtp-track\">\n<p>    <!-- SLIDE 1: Overview --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-label\">Overview<\/span>\n<h2 class=\"mtp-title\">MiniMax M3: Frontier Coding, 1M-Token Context, Native Multimodality<\/h2>\n<div class=\"mtp-body\">\n<p>MiniMax officially released M3 on <span class=\"mtp-highlight\">June 1, 2026<\/span>. The API is live now. Model weights and technical report will be open-sourced within 10 days.<\/p>\n<p>M3 is the next model in the M-series line after M2.7. MiniMax positions it as the first open-weight model to combine all three of the following in a single architecture:<\/p>\n<\/div>\n<div class=\"mtp-stat-grid\">\n<div class=\"mtp-stat\">\n          <span class=\"mtp-stat-val\">1M<\/span><br \/>\n          <span class=\"mtp-stat-label\">Token Context Window<\/span>\n        <\/div>\n<div class=\"mtp-stat\">\n          <span class=\"mtp-stat-val\">59.0%<\/span><br \/>\n          <span class=\"mtp-stat-label\">SWE-Bench Pro Score<\/span>\n        <\/div>\n<div class=\"mtp-stat\">\n          <span class=\"mtp-stat-val\">MSA<\/span><br \/>\n          <span class=\"mtp-stat-label\">Sparse Attention Architecture<\/span>\n        <\/div>\n<div class=\"mtp-stat\">\n          <span class=\"mtp-stat-val\">70.06%<\/span><br \/>\n          <span class=\"mtp-stat-label\">OSWorld-Verified (Computer Use)<\/span>\n        <\/div>\n<\/div>\n<\/div>\n<p>    <!-- SLIDE 2: MSA Architecture --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-label\">Architecture<\/span>\n<h2 class=\"mtp-title\">MSA: MiniMax Sparse Attention<\/h2>\n<div class=\"mtp-body\">\n<p>Standard full attention has quadratic computational complexity. As context length grows, compute cost grows as the square of the sequence length. MSA is designed to solve this at the operator level.<\/p>\n<p>Compared to approaches like <strong>DSA<\/strong> and <strong>MoBA<\/strong>, MSA partitions the KV cache into blocks more precisely, achieving higher effective context coverage.<\/p>\n<p>MSA uses a <strong>\u201cKV outer gather Q\u201d<\/strong> approach \u2014 each KV block is read only once, memory access is contiguous, and arithmetic intensity is significantly better than common methods.<\/p>\n<\/div>\n<div class=\"mtp-stat-grid\">\n<div class=\"mtp-stat\">\n          <span class=\"mtp-stat-val\">&gt;9\u00d7<\/span><br \/>\n          <span class=\"mtp-stat-label\">Prefill Speedup at 1M ctx<\/span>\n        <\/div>\n<div class=\"mtp-stat\">\n          <span class=\"mtp-stat-val\">&gt;15\u00d7<\/span><br \/>\n          <span class=\"mtp-stat-label\">Decoding Speedup at 1M ctx<\/span>\n        <\/div>\n<div class=\"mtp-stat\">\n          <span class=\"mtp-stat-val\">1\/20<\/span><br \/>\n          <span class=\"mtp-stat-label\">Per-token compute vs M2 at 1M<\/span>\n        <\/div>\n<div class=\"mtp-stat\">\n          <span class=\"mtp-stat-val\">&gt;4\u00d7<\/span><br \/>\n          <span class=\"mtp-stat-label\">Faster than Flash-Sparse-Attn<\/span>\n        <\/div>\n<\/div>\n<\/div>\n<p>    <!-- SLIDE 3: Benchmarks --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-label\">Benchmarks<\/span>\n<h2 class=\"mtp-title\">Coding and Agentic Performance<\/h2>\n<div class=\"mtp-body\">\n<p>Results reported by MiniMax. SWE-Bench Verified used Claude Code scaffolding, averaged over 4 runs. SWE-Bench Pro used Claude Code scaffolding, aligned to official evaluation.<\/p>\n<\/div>\n<ul class=\"mtp-list\">\n<li><strong>SWE-Bench Pro: 59.0%<\/strong> \u2014 surpasses GPT-5.5 and Gemini 3.1 Pro; approaches Opus 4.7<\/li>\n<li><strong>Terminal-Bench 2.1: 66.0%<\/strong><\/li>\n<li><strong>SWE-fficiency: 34.8%<\/strong><\/li>\n<li><strong>KernelBench Hard: 28.8%<\/strong> \u2014 evaluated on NVIDIA Blackwell GPUs (sm_120)<\/li>\n<li><strong>MCP Atlas: 74.2%<\/strong><\/li>\n<li><strong>Claw-Eval:<\/strong> Highest score among models evaluated (161 tasks)<\/li>\n<li><strong>SVG-Bench:<\/strong> Surpasses Opus 4.7<\/li>\n<li><strong>OmniDocBench:<\/strong> Above Gemini 3.1 Pro<\/li>\n<li><strong>OSWorld-Verified: 70.06%<\/strong> \u2014 361 samples, Max Steps = 200<\/li>\n<\/ul><\/div>\n<p>    <!-- SLIDE 4: Multimodality --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-label\">Multimodality<\/span>\n<h2 class=\"mtp-title\">Native Multimodal Training from Step 0<\/h2>\n<div class=\"mtp-body\">\n<p>M3 underwent mixed-modality training from step 0. Text, images, and video are trained together from the start \u2014 not added as a post-training capability.<\/p>\n<p>MiniMax reports that <strong>interleaved data<\/strong> \u2014 sequences where text and images are naturally intermixed \u2014 is more critical to model performance than commonly assumed.<\/p>\n<p>After rebuilding the entire data pipeline for interleaved formats, training data was scaled to the order of <span class=\"mtp-highlight\">100 trillion tokens<\/span>.<\/p>\n<\/div>\n<div class=\"mtp-divider\"><\/div>\n<div class=\"mtp-body\">\n<p><strong>M3 supports:<\/strong><\/p>\n<\/div>\n<ul class=\"mtp-list\">\n<li>Image input<\/li>\n<li>Video input<\/li>\n<li>Desktop computer operation (computer use)<\/li>\n<\/ul><\/div>\n<p>    <!-- SLIDE 5: Real-World Tasks --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-label\">Real-World Tasks<\/span>\n<h2 class=\"mtp-title\">Three Internal Tasks Documented by MiniMax<\/h2>\n<ul class=\"mtp-list\">\n<li><strong>Paper Reproduction<\/strong> \u2014 M3 reproduced the ICLR 2025 paper <em>Learning Dynamics of LLM Finetuning<\/em> autonomously over ~12 hours, producing 18 commits and 23 experimental figures with no human intervention.<\/li>\n<li><strong>CUDA Kernel Optimization<\/strong> \u2014 M3 optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs over ~24 hours: 147 benchmark submissions, 1,959 tool calls, 6 landmark optimization rounds. Improved Hopper FP8 peak utilization from <strong>7.6% \u2192 71.3%<\/strong> (9.4\u00d7 speedup). Best solution appeared on submission 145.<\/li>\n<li><strong>PostTrainBench<\/strong> \u2014 M3 autonomously ran data synthesis \u2192 training \u2192 evaluation \u2192 iteration for 4 base models over 12 hours. Scored <strong>0.37<\/strong>, below Opus 4.7 (0.42) and GPT-5.5 (0.39), but ahead of other evaluated models. Targets: AIME2025, BFCL, GPQA Main, GSM8K, HumanEval.<\/li>\n<\/ul><\/div>\n<p>    <!-- SLIDE 6: MiniMax Code --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-label\">MiniMax Code<\/span>\n<h2 class=\"mtp-title\">MiniMax Code: Agent Product Built and Trained with M3<\/h2>\n<div class=\"mtp-body\">\n<p>MiniMax Code is an agent product built and trained together with M3. Available at <span class=\"mtp-highlight\">agent.minimaxi.com\/download<\/span>. Works with MiniMax Token Plans.<\/p>\n<\/div>\n<ul class=\"mtp-list\">\n<li><strong>Agent Teams<\/strong> \u2014 multiple agents run concurrent, multi-stage, dynamically adjustable workflows<\/li>\n<li><strong>Producer + Verifier loop<\/strong> \u2014 adversarial harness enables continuous self-correction during execution<\/li>\n<li><strong>Computer use<\/strong> \u2014 M3\u2019s native multimodal capability enables cross-application desktop automation<\/li>\n<li><strong>Built on OpenCode and Pi<\/strong> \u2014 MiniMax states it plans to open-source MiniMax Code in the future<\/li>\n<\/ul>\n<div class=\"mtp-code-block\">\/\/ Example use case<br \/>\nUser (on phone): \u201cOpen the local ERP client<br \/>\nand batch-enter invoice data from this Excel file.\u201d<br \/>\n\u2192 MiniMax Code handles operations across<br \/>\n  applications, files, and systems on desktop.<\/div>\n<\/div>\n<p>    <!-- SLIDE 7: API &amp; Pricing --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-label\">API &amp; Pricing<\/span>\n<h2 class=\"mtp-title\">API Details and Token Plan Tiers<\/h2>\n<div class=\"mtp-body\">\n<p>The M3 API is live at <span class=\"mtp-highlight\">platform.minimax.io<\/span>.<\/p>\n<p><strong>Pricing by input length:<\/strong> Calls \u2264512K tokens \u2192 standard rate. Calls &gt;512K \u2192 higher long-context rate.<\/p>\n<p><strong>Thinking mode:<\/strong> Toggle on\/off at request time. Both modes share the same pricing.<\/p>\n<p><strong>Service tiers:<\/strong> <code>standard<\/code> (default) and <code>priority<\/code> (service_tier=priority) \u2014 priority available via sales, opening to all users soon.<\/p>\n<\/div>\n<div class=\"mtp-divider\"><\/div>\n<div class=\"mtp-tier-row\">\n        <span class=\"mtp-tier-name\">Plus<\/span><br \/>\n        <span class=\"mtp-tier-tokens\">~1.7B tokens\/mo<\/span><br \/>\n        <span class=\"mtp-tier-price\">$20\/mo<\/span>\n      <\/div>\n<div class=\"mtp-tier-row\">\n        <span class=\"mtp-tier-name\">Max<\/span><br \/>\n        <span class=\"mtp-tier-tokens\">~5.1B tokens\/mo<\/span><br \/>\n        <span class=\"mtp-tier-price\">$50\/mo<\/span>\n      <\/div>\n<div class=\"mtp-tier-row\">\n        <span class=\"mtp-tier-name\">Ultra<\/span><br \/>\n        <span class=\"mtp-tier-tokens\">~9.8B tokens\/mo<\/span><br \/>\n        <span class=\"mtp-tier-price\">$120\/mo<\/span>\n      <\/div>\n<div class=\"mtp-body\">\n<p>Text, image, speech, and music usage all draw from the same token pool.<\/p>\n<\/div>\n<\/div>\n<p>    <!-- SLIDE 8: Key Takeaways --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-label\">Key Takeaways<\/span>\n<h2 class=\"mtp-title\">What Engineers and Researchers Need to Know<\/h2>\n<ul class=\"mtp-list\">\n<li>MiniMax M3 launched <strong>June 1, 2026<\/strong>. API is live. Open model weights and technical report committed within 10 days.<\/li>\n<li>MSA delivers <strong>&gt;9\u00d7 prefill<\/strong> and <strong>&gt;15\u00d7 decoding<\/strong> speedup at 1M-token context vs M2, at 1\/20th the per-token compute.<\/li>\n<li>M3 scores <strong>59.0% on SWE-Bench Pro<\/strong>, surpassing GPT-5.5 and Gemini 3.1 Pro.<\/li>\n<li>Natively multimodal from step 0 \u2014 supports image, video input, and <strong>70.06% on OSWorld-Verified<\/strong> for computer use.<\/li>\n<li>Thinking mode toggleable at request time. Token Plan starts at <strong>$20\/month<\/strong> (~1.7B M3 tokens).<\/li>\n<\/ul><\/div>\n<\/div>\n<p><!-- end track --><\/p>\n<p>  <!-- NAV --><\/p>\n<div class=\"mtp-nav\">\n<div class=\"mtp-dots\"><\/div>\n<p>    <span class=\"mtp-progress\">1 \/ 8<\/span><\/p>\n<div class=\"mtp-arrows\">\n      <button class=\"mtp-btn\">\u2190<\/button><br \/>\n      <button class=\"mtp-btn\">\u2192<\/button>\n    <\/div>\n<\/div>\n<p>  <!-- FOOTER --><\/p>\n<div class=\"mtp-footer\">\n    <span class=\"mtp-footer-brand\">\u25a0 <span>Marktechpost<\/span> \u2014 AI\/ML Research &amp; News<\/span><br \/>\n    <span class=\"mtp-footer-brand\">marktechpost.com<\/span>\n  <\/div>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>MiniMax M3 launched June 1, 2026; API is live now. MiniMax has committed to releasing open model weights and a technical report within 10 days.<\/li>\n<li>MSA (MiniMax Sparse Attention) delivers more than 9\u00d7 prefill and more than 15\u00d7 decoding speedup at 1M-token context versus M2, at 1\/20th the per-token compute.<\/li>\n<li>M3 scores 59.0% on SWE-Bench Pro, surpassing GPT-5.5 and Gemini 3.1 Pro.<\/li>\n<li>M3 is natively multimodal from step 0, supporting image and video input, and achieves 70.06% on OSWorld-Verified for computer use.<\/li>\n<\/ul>\n<figure class=\"wp-block-embed is-type-rich is-provider-x wp-block-embed-x\">\n<div class=\"wp-block-embed__wrapper\">\n<div class=\"embed-x\">\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities<\/p>\n<p>&#8211; Coding &amp; Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas<br \/>&#8211; MiniMax Sparse Attention scales context to 1M<br \/>-\u2026 <a href=\"https:\/\/t.co\/TF891iJukF\">pic.twitter.com\/TF891iJukF<\/a><\/p>\n<p>\u2014 MiniMax (official) (@MiniMax_AI) <a href=\"https:\/\/x.com\/MiniMax_AI\/status\/2061266317815296322?ref_src=twsrc%5Etfw\">June 1, 2026<\/a><\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/figure>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p class=\"wp-block-paragraph\">\n<\/p><p class=\"wp-block-paragraph\">Check out\u00a0the\u00a0<strong><a href=\"https:\/\/platform.minimax.io\/docs\/guides\/models-intro\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/wbash1wF6efRj8G58\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/06\/01\/minimax-releases-minimax-m3-with-msa-architecture-supporting-1m-token-context-native-multimodality-and-agentic-coding\/\">MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>MiniMax officially released Mi&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1022","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/1022","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1022"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/1022\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1022"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1022"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1022"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}