{"id":763,"date":"2026-04-21T09:58:50","date_gmt":"2026-04-21T01:58:50","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=763"},"modified":"2026-04-21T09:58:50","modified_gmt":"2026-04-21T01:58:50","slug":"moonshot-ai-releases-kimi-k2-6-with-long-horizon-coding-agent-swarm-scaling-to-300-sub-agents-and-4000-coordinated-steps","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=763","title":{"rendered":"Moonshot AI Releases Kimi K2.6 with Long-Horizon Coding, Agent Swarm Scaling to 300 Sub-Agents and 4,000 Coordinated Steps"},"content":{"rendered":"<p>Moonshot AI, the Chinese AI lab behind the Kimi assistant, today open-sourced <strong>Kimi K2.6<\/strong> \u2014 a native multimodal agentic model that pushes the boundaries of what an AI system can do when left to run autonomously on hard software engineering problems. The release targets practical deployment scenarios: long-running coding agents, front-end generation from natural language, massively parallel agent swarms coordinating hundreds of specialized sub-agents simultaneously, and a new open ecosystem where humans and agents from any device collaborate on the same task. The model is available now on Kimi.com, the Kimi App, the API, and Kimi Code CLI. Weights are published on Hugging Face under a Modified MIT License.<\/p>\n<h3 class=\"wp-block-heading\"><strong>What Kind of Model is This, Technically?<\/strong><\/h3>\n<p>Kimi K2.6 is a <strong>Mixture-of-Experts (MoE)<\/strong> model \u2014 an architecture that\u2019s become increasingly dominant at frontier scale. Instead of activating all of a model\u2019s parameters for every token it processes, a MoE model routes each token to a small subset of specialized \u2018experts.\u2019 This allows you to build a very large model while keeping inference compute tractable.<\/p>\n<p>Kimi K2.6 has 1 trillion total parameters, but only 32 billion are activated per token. It has 384 experts in total, with 8 selected per token, plus 1 shared expert that is always active. The model has 61 layers (including one dense layer), uses an attention hidden dimension of 7,168, a MoE hidden dimension of 2,048 per expert, and 64 attention heads.<\/p>\n<p>Beyond text, K2.6 is a <strong>native multimodal<\/strong> model \u2014 meaning vision is baked in architecturally, not bolted on. It uses a <strong>MoonViT<\/strong> vision encoder with 400M parameters and supports image and video input natively. Other architectural details: it uses <strong>Multi-head Latent Attention (MLA)<\/strong> as its attention mechanism, <strong>SwiGLU<\/strong> as the activation function, a vocabulary size of 160K tokens, and a context length of 256K tokens.<\/p>\n<p>For deployment, K2.6 is recommended to run on <strong>vLLM<\/strong>, <strong>SGLang<\/strong>, or <strong>KTransformers<\/strong>. It shares the same architecture as Kimi K2.5, so existing deployment configurations can be reused directly. The required <code>transformers<\/code> version is <code>&gt;=4.57.1, &lt;5.0.0<\/code>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Long-Horizon Coding Headline Numbers<\/strong><\/h3>\n<p>The metric that will likely get the most attention from dev teams is <strong>SWE-Bench Pro<\/strong> \u2014 a benchmark testing whether a model can resolve real-world GitHub issues in professional software repositories.<\/p>\n<p>Kimi K2.6 scores 58.6 on SWE-Bench Pro, compared to 57.7 for GPT-5.4 (xhigh), 53.4 for Claude Opus 4.6 (max effort), 54.2 for Gemini 3.1 Pro (thinking high), and 50.7 for Kimi K2.5. On SWE-Bench Verified it scores 80.2, sitting within a tight band of top-tier models.<\/p>\n<p>On <strong>Terminal-Bench 2.0<\/strong> using the Terminus-2 agent framework, K2.6 achieves 66.7, compared to 65.4 for both GPT-5.4 and Claude Opus 4.6, and 68.5 for Gemini 3.1 Pro. On <strong>LiveCodeBench (v6)<\/strong>, it scores 89.6 vs. Claude Opus 4.6\u2019s 88.8.<\/p>\n<p>Perhaps the most striking number for agentic workloads is <strong>Humanity\u2019s Last Exam (HLE-Full) with tools<\/strong>: K2.6 scores 54.0 \u2014 leading every model in the comparison, including GPT-5.4 (52.1), Claude Opus 4.6 (53.0), and Gemini 3.1 Pro (51.4). HLE is widely considered one of the hardest knowledge benchmarks, and the with-tools variant specifically tests how well a model can leverage external resources autonomously. Internally, Moonshot evaluates long-horizon coding gains using their <strong>Kimi Code Bench<\/strong>, an internal benchmark covering diverse, complicated end-to-end tasks across languages and domains, where K2.6 demonstrates significant improvements over K2.5.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1958\" height=\"1224\" data-attachment-id=\"79180\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/20\/moonshot-ai-releases-kimi-k2-6-with-long-horizon-coding-agent-swarm-scaling-to-300-sub-agents-and-4000-coordinated-steps\/screenshot-2026-04-20-at-6-58-20-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-20-at-6.58.20-PM-1.png\" data-orig-size=\"1958,1224\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-20 at 6.58.20\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-20-at-6.58.20-PM-1-1024x640.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-20-at-6.58.20-PM-1.png\" alt=\"\" class=\"wp-image-79180\" \/><figcaption class=\"wp-element-caption\">https:\/\/www.kimi.com\/blog\/kimi-k2-6<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>What 13 Hours of Autonomous Coding Actually Looks Like<\/strong><\/h3>\n<p>Two engineering case studies in the release document what \u2018long-horizon coding\u2019 means in practice.<\/p>\n<p>In the first, Kimi K2.6 successfully downloaded and deployed the <strong>Qwen3.5-0.8B<\/strong> model locally on a Mac, then implemented and optimized model inference in <strong>Zig<\/strong> \u2014 a highly niche programming language \u2014 demonstrating exceptional out-of-distribution generalization. Across 4,000+ tool calls, over 12 hours of continuous execution, and 14 iterations, K2.6 improved throughput from approximately 15 to approximately 193 tokens\/sec, ultimately achieving speeds approximately 20% faster than LM Studio.<\/p>\n<p>In the second, Kimi K2.6 autonomously overhauled <strong>exchange-core<\/strong>, an 8-year-old open-source financial matching engine. Over a 13-hour execution, the model iterated through 12 optimization strategies, initiating over 1,000 tool calls to precisely modify more than 4,000 lines of code. Acting as an expert systems architect, K2.6 analyzed CPU and allocation flame graphs to pinpoint hidden bottlenecks and reconfigured the core thread topology from 4ME+2RE to 2ME+1RE \u2014 extracting a 185% medium throughput leap (from 0.43 to 1.24 MT\/s) and a 133% performance throughput gain (from 1.23 to 2.86 MT\/s).<\/p>\n<h3 class=\"wp-block-heading\"><strong>Agent Swarms: Scaling Horizontally, Not Just Vertically<\/strong><\/h3>\n<p>One of K2.6\u2019s most architecturally interesting capabilities is its <strong>Agent Swarm<\/strong> \u2014 an approach to parallelizing complex tasks across many specialized sub-agents, rather than relying on a single, deeper reasoning chain.<\/p>\n<p>The architecture scales horizontally to 300 sub-agents executing across 4,000 coordinated steps simultaneously, a substantial expansion from K2.5\u2019s 100 sub-agents and 1,500 steps. The swarm dynamically decomposes tasks into heterogeneous subtasks \u2014 combining broad web search with deep research, large-scale document analysis with long-form writing, and multi-format content generation in parallel \u2014 then delivers consolidated outputs including documents, websites, slides, and spreadsheets within a single autonomous run. The swarm also introduces a concrete <strong>Skills<\/strong> capability: it can convert any high-quality PDF, spreadsheet, slide, or Word document into a reusable Skill. K2.6 captures and maintains the document\u2019s structural and stylistic DNA, allowing it to reproduce the same quality and format in future tasks \u2014 think of it as teaching the swarm by example rather than prompt.<\/p>\n<p>Concrete demonstrations include: a 100-sub-agent run that matched a single uploaded CV against 100 relevant roles in California and delivered 100 fully customized resumes; another that identified 30 retail stores in Los Angeles without websites from Google Maps and generated landing pages for each; and one that turned an astrophysics paper into a reusable academic skill and then produced a 40-page, 7,000-word research paper alongside a structured dataset with 20,000+ entries and 14 astronomy-grade charts.<\/p>\n<p>On the <strong>BrowseComp<\/strong> benchmark in Agent Swarm mode, K2.6 scores 86.3 compared to 78.4 for Kimi K2.5. On <strong>DeepSearchQA<\/strong> (f1-score), K2.6 scores 92.5 against 78.6 for GPT-5.4.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Bring Your Own Agents: Claw Groups<\/strong><\/h3>\n<p>Beyond Moonshot\u2019s own swarm infrastructure, K2.6 introduces <strong>Claw Groups<\/strong> as a research preview \u2014 a new feature that opens the agent swarm architecture to an external, heterogeneous ecosystem.<\/p>\n<p>The key design principle: multiple agents and humans operate as genuine collaborators in a shared operational space. Users can onboard agents from any device, running any model, each carrying their own specialized toolkits, skills, and persistent memory contexts \u2014 whether deployed on local laptops, mobile devices, or cloud instances. At the center of this swarm, K2.6 serves as an adaptive coordinator: it dynamically matches tasks to agents based on their specific skill profiles and available tools, detects when an agent encounters failure or stalls, automatically reassigns the task or regenerates subtasks, and manages the full lifecycle of deliverables from initiation through validation to completion.<\/p>\n<p>Moonshot has been using Claw Groups internally to run their own content production and launch campaigns, with specialized agents including Demo Makers, Benchmark Makers, Social Media Agents, and Video Makers working in parallel \u2014 with K2.6 coordinating the process. For devs thinking about multi-agent orchestration architectures, this is worth looking into: it represents a shift from \u2018AI does tasks for you\u2019 to \u2018AI coordinates a team of heterogeneous agents, some of which you built, on your behalf.\u2019<\/p>\n<h3 class=\"wp-block-heading\"><strong>Proactive Agents: 5 Days of Autonomous Operation<\/strong><\/h3>\n<p>K2.6 demonstrates strong performance in persistent, proactive agents such as <strong>OpenClaw<\/strong> and <strong>Hermes<\/strong>, which operate across multiple applications with continuous, 24\/7 execution. These workflows require AI to proactively manage schedules, execute code, and orchestrate cross-platform operations without human oversight.<\/p>\n<p>Moonshot\u2019s own RL infrastructure team used a K2.6-backed agent that operated autonomously for 5 days, managing monitoring, incident response, and system operations \u2014 demonstrating persistent context, multi-threaded task handling, and full-cycle execution from alert to resolution.<\/p>\n<p>Performance in this regime is measured by an internal <strong>Claw Bench<\/strong>, an evaluation suite spanning five domains: Coding Tasks, IM Ecosystem Integration, Information Research &amp; Analysis, Scheduled Task Management, and Memory Utilization. Across all five, K2.6 significantly outperforms K2.5 in task completion rates and tool invocation accuracy \u2014 particularly in workflows requiring sustained autonomous operation without human oversight.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Two Operational Modes: Thinking and Instant<\/strong><\/h3>\n<p>For devs integrating via API, K2.6 exposes<strong> two inference modes<\/strong> that matter for latency\/quality tradeoffs:<\/p>\n<p><strong>Thinking mode<\/strong> activates full chain-of-thought reasoning \u2014 the model reasons through a problem before producing a final answer. This is recommended for complex coding and agentic tasks, with a recommended temperature of 1.0. There is also a <strong>preserve thinking<\/strong> mode, which retains full reasoning content across multi-turn interactions and enhances performance in coding agent scenarios \u2014 disabled by default, but worth enabling when building agents that need to maintain coherent reasoning state across many steps.<\/p>\n<p><strong>Instant mode<\/strong> disables extended reasoning for lower-latency responses. To use Instant mode via the official API, pass <code>{'thinking': {'type': 'disabled'}}<\/code> in <code>extra_body<\/code>. For vLLM or SGLang deployments, pass <code>{'chat_template_kwargs': {\"thinking\": False}}<\/code> instead, with a recommended temperature of 0.6 and top-p of 0.95.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li>Kimi K2.6 is a 1-trillion-parameter, native multimodal MoE model with only 32B parameters activated per token, released fully open-source under a Modified MIT License.<\/li>\n<li>K2.6 leads all frontier models on HLE-Full with tools (54.0), outperforming GPT-5.4 (52.1), Claude Opus 4.6 (53.0), and Gemini 3.1 Pro (51.4) on one of AI\u2019s hardest agentic benchmarks.<\/li>\n<li>In real-world tests, K2.6 autonomously overhauled an 8-year-old financial matching engine over 13 hours, delivering a 185% medium throughput leap and a 133% performance throughput gain.<\/li>\n<li>The Agent Swarm architecture scales to 300 sub-agents executing 4,000 coordinated steps simultaneously, and can convert any PDF, spreadsheet, or slide into a reusable Skill that preserves structural and stylistic DNA.<\/li>\n<li>Claw Groups, introduced as a research preview, lets humans and agents from any device running any model collaborate in a shared swarm, with K2.6 serving as an adaptive coordinator that dynamically assigns tasks, detects failures, and manages full delivery lifecycles.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the<strong>\u00a0<a href=\"https:\/\/huggingface.co\/moonshotai\/Kimi-K2.6\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weights<\/a>, <a href=\"https:\/\/platform.moonshot.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">API Access<\/a> <\/strong>and<strong> <a href=\"https:\/\/www.kimi.com\/blog\/kimi-k2-6\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a><\/strong>.<strong>\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">130k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/20\/moonshot-ai-releases-kimi-k2-6-with-long-horizon-coding-agent-swarm-scaling-to-300-sub-agents-and-4000-coordinated-steps\/\">Moonshot AI Releases Kimi K2.6 with Long-Horizon Coding, Agent Swarm Scaling to 300 Sub-Agents and 4,000 Coordinated Steps<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Moonshot AI, the Chinese AI la&hellip;<\/p>\n","protected":false},"author":1,"featured_media":764,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-763","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/763","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=763"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/763\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/764"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=763"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=763"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=763"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}