{"id":425,"date":"2026-02-17T02:53:19","date_gmt":"2026-02-16T18:53:19","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=425"},"modified":"2026-02-17T02:53:19","modified_gmt":"2026-02-16T18:53:19","slug":"alibaba-qwen-team-releases-qwen3-5-397b-moe-model-with-17b-active-parameters-and-1m-token-context-for-ai-agents","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=425","title":{"rendered":"Alibaba Qwen Team Releases Qwen3.5-397B MoE Model with 17B Active Parameters and 1M Token Context for AI agents"},"content":{"rendered":"<p>Alibaba Cloud just updated the open-source landscape. Today, the Qwen team released <strong>Qwen3.5<\/strong>, the newest generation of their large language model (LLM) family. The most powerful version is <strong>Qwen3.5-397B-A17B<\/strong>. This model is a sparse Mixture-of-Experts (MoE) system. It combines massive reasoning power with high efficiency.<\/p>\n<p>Qwen3.5 is a native vision-language model. It is designed specifically for AI agents. It can see, code, and reason across <strong>201<\/strong> languages.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1712\" height=\"1052\" data-attachment-id=\"77924\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/02\/16\/alibaba-qwen-team-releases-qwen3-5-397b-moe-model-with-17b-active-parameters-and-1m-token-context-for-ai-agents\/screenshot-2026-02-16-at-10-47-42-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.47.42-AM-1.png\" data-orig-size=\"1712,1052\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-02-16 at 10.47.42\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.47.42-AM-1-300x184.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.47.42-AM-1-1024x629.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.47.42-AM-1.png\" alt=\"\" class=\"wp-image-77924\" \/><figcaption class=\"wp-element-caption\">https:\/\/qwen.ai\/blog?id=qwen3.5<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Core Architecture: 397B Total, 17B Active<\/strong><\/h3>\n<p>The technical specifications of <strong>Qwen3.5-397B-A17B<\/strong> are impressive. The model contains <strong>397B<\/strong> total parameters. However, it uses a sparse MoE design. This means it only activates <strong>17B<\/strong> parameters during any single forward pass.<\/p>\n<p>This <strong>17B<\/strong> activation count is the most important number for devs. It allows the model to provide the intelligence of a <strong>400B<\/strong> model. But it runs with the speed of a much smaller model. The Qwen team reports a <strong>8.6x<\/strong> to <strong>19.0x<\/strong> increase in decoding throughput compared to previous generations. This efficiency solves the high cost of running large-scale AI.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1720\" height=\"1096\" data-attachment-id=\"77922\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/02\/16\/alibaba-qwen-team-releases-qwen3-5-397b-moe-model-with-17b-active-parameters-and-1m-token-context-for-ai-agents\/screenshot-2026-02-16-at-10-46-46-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.46.46-AM-1.png\" data-orig-size=\"1720,1096\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-02-16 at 10.46.46\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.46.46-AM-1-300x191.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.46.46-AM-1-1024x653.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.46.46-AM-1.png\" alt=\"\" class=\"wp-image-77922\" \/><figcaption class=\"wp-element-caption\">https:\/\/qwen.ai\/blog?id=qwen3.5<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Efficient Hybrid Architecture: Gated Delta Networks<\/strong><\/h3>\n<p>Qwen3.5 does not use a standard Transformer design. It uses an \u2018Efficient Hybrid Architecture.\u2019 Most LLMs rely only on Attention mechanisms. These can become slow with long text. Qwen3.5 combines <strong>Gated Delta Networks<\/strong> (linear attention) with <strong>Mixture-of-Experts (MoE)<\/strong>.<\/p>\n<p>The model consists of <strong>60<\/strong> layers. The hidden dimension size is <strong>4,096<\/strong>. These layers follow a specific \u2018Hidden Layout.\u2019 The layout groups layers into sets of <strong>4<\/strong>.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>3<\/strong> blocks use Gated DeltaNet-plus-MoE.<\/li>\n<li><strong>1<\/strong> block uses Gated Attention-plus-MoE.<\/li>\n<li>This pattern repeats <strong>15<\/strong> times to reach <strong>60<\/strong> layers.<\/li>\n<\/ul>\n<p><strong>Technical details include:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Gated DeltaNet:<\/strong> It uses <strong>64<\/strong> linear attention heads for Values (V). It uses <strong>16<\/strong> heads for Queries and Keys (QK).<\/li>\n<li><strong>MoE Structure:<\/strong> The model has <strong>512<\/strong> total experts. Each token activates <strong>10<\/strong> routed experts and <strong>1<\/strong> shared expert. This equals <strong>11<\/strong> active experts per token.<\/li>\n<li><strong>Vocabulary:<\/strong> The model uses a padded vocabulary of <strong>248,320<\/strong> tokens.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Native Multimodal Training: Early Fusion<\/strong><\/h3>\n<p>Qwen3.5 is a <strong>native vision-language model<\/strong>. Many other models add vision capabilities later. Qwen3.5 used \u2018Early Fusion\u2019 training. This means the model learned from images and text at the same time.<\/p>\n<p>The training used trillions of multimodal tokens. This makes Qwen3.5 better at visual reasoning than previous <strong>Qwen3-VL<\/strong> versions. It is highly capable of \u2018agentic\u2019 tasks. For example, it can look at a UI screenshot and generate the exact HTML and CSS code. It can also analyze long videos with second-level accuracy.<\/p>\n<p>The model supports the <strong>Model Context Protocol (MCP)<\/strong>. It also handles complex function-calling. These features are vital for building agents that control apps or browse the web. In the <strong>IFBench<\/strong> test, it scored <strong>76.5<\/strong>. This score beats many proprietary models.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1708\" height=\"990\" data-attachment-id=\"77926\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/02\/16\/alibaba-qwen-team-releases-qwen3-5-397b-moe-model-with-17b-active-parameters-and-1m-token-context-for-ai-agents\/screenshot-2026-02-16-at-10-48-07-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.48.07-AM-1.png\" data-orig-size=\"1708,990\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-02-16 at 10.48.07\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.48.07-AM-1-300x174.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.48.07-AM-1-1024x594.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-16-at-10.48.07-AM-1.png\" alt=\"\" class=\"wp-image-77926\" \/><figcaption class=\"wp-element-caption\">https:\/\/qwen.ai\/blog?id=qwen3.5<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Solving the Memory Wall: 1M Context Length<\/strong><\/h3>\n<p>Long-form data processing is a core feature of Qwen3.5. The base model has a native context window of <strong>262,144<\/strong> (256K) tokens. The hosted <strong>Qwen3.5-Plus<\/strong> version goes even further. It supports <strong>1M tokens.<\/strong><\/p>\n<p>Alibaba Qwen team used a new asynchronous Reinforcement Learning (RL) framework for this. It ensures the model stays accurate even at the end of a <strong>1M<\/strong> token document. For Devs, this means you can feed an entire codebase into one prompt. You do not always need a complex Retrieval-Augmented Generation (RAG) system.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Performance and Benchmarks<\/strong><\/h3>\n<p>The model excels in technical fields. It achieved high scores on <strong>Humanity\u2019s Last Exam (HLE-Verified)<\/strong>. This is a difficult benchmark for AI knowledge.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Coding:<\/strong> It shows parity with top-tier closed-source models.<\/li>\n<li><strong>Math:<\/strong> The model uses \u2018Adaptive Tool Use.\u2019 It can write Python code to solve math problems. It then runs the code to verify the answer.<\/li>\n<li><strong>Languages:<\/strong> It supports <strong>201<\/strong> different languages and dialects. This is a big jump from the <strong>119<\/strong> languages in the previous version.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Hybrid Efficiency (MoE + Gated Delta Networks):<\/strong> Qwen3.5 uses a <strong>3:1<\/strong> ratio of <strong>Gated Delta Networks<\/strong> (linear attention) to standard <strong>Gated Attention<\/strong> blocks across <strong>60<\/strong> layers. This hybrid design allows for an <strong>8.6x<\/strong> to <strong>19.0x<\/strong> increase in decoding throughput compared to previous generations.<\/li>\n<li><strong>Massive Scale, Low Footprint:<\/strong> The <strong>Qwen3.5-397B-A17B<\/strong> features <strong>397B<\/strong> total parameters but only activates <strong>17B<\/strong> per token. You get <strong>400B-class<\/strong> intelligence with the inference speed and memory requirements of a much smaller model.<\/li>\n<li><strong>Native Multimodal Foundation:<\/strong> Unlike \u2018bolted-on\u2019 vision models, Qwen3.5 was trained via <strong>Early Fusion<\/strong> on trillions of text and image tokens simultaneously. This makes it a top-tier visual agent, scoring <strong>76.5<\/strong> on <strong>IFBench<\/strong> for following complex instructions in visual contexts.<\/li>\n<li><strong>1M Token Context:<\/strong> While the base model supports a native <strong>256k<\/strong> token context, the hosted <strong>Qwen3.5-Plus<\/strong> handles up to <strong>1M<\/strong> tokens. This massive window allows devs to process entire codebases or 2-hour videos without needing complex RAG pipelines.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/qwen.ai\/blog?id=qwen3.5\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a>, <a href=\"https:\/\/huggingface.co\/collections\/Qwen\/qwen35\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weights<\/a> <\/strong>and<strong> <a href=\"https:\/\/github.com\/QwenLM\/Qwen3.5\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Repo<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/02\/16\/alibaba-qwen-team-releases-qwen3-5-397b-moe-model-with-17b-active-parameters-and-1m-token-context-for-ai-agents\/\">Alibaba Qwen Team Releases Qwen3.5-397B MoE Model with 17B Active Parameters and 1M Token Context for AI agents<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Alibaba Cloud just updated the&hellip;<\/p>\n","protected":false},"author":1,"featured_media":426,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-425","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=425"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/425\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/426"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=425"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=425"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}