{"id":1019,"date":"2026-06-02T16:00:32","date_gmt":"2026-06-02T08:00:32","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=1019"},"modified":"2026-06-02T16:00:32","modified_gmt":"2026-06-02T08:00:32","slug":"jetbrains-releases-mellum2-a-12b-moe-model-for-fast-specialized-tasks-in-multi-model-ai-pipelines","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=1019","title":{"rendered":"JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines"},"content":{"rendered":"<p class=\"wp-block-paragraph\">JetBrains released Mellum2, open-sourcing the weights under the Apache 2.0 license. The first version of Mellum was a completion-focused 4B dense model. Mellum2 is its successor: a general-purpose model specialized in software engineering. It covers code generation and editing, debugging, multi-step reasoning, tool use and function calling, agentic coding, and conversational programming assistance.<\/p>\n<p class=\"wp-block-paragraph\">JetBrains team positions Mellum2 as a \u201cfocal model\u201d \u2014 a fast, specialized component inside larger AI systems, not a standalone replacement for frontier models.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Architecture<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Mellum2 uses a Mixture-of-Experts (MoE) architecture with 12B total parameters and 2.5B active parameters per token. In MoE models, only a subset of parameters runs on each token. Here, the model has 64 experts and activates 8 per token. This keeps per-token compute equivalent to a 2.5B dense model, while the total parameter count provides higher capacity for specialization.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Key architectural details:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Layers:<\/strong> 28<\/li>\n<li><strong>Hidden size:<\/strong> 2304<\/li>\n<li><strong>MoE experts:<\/strong> 64 total, 8 activated per token<\/li>\n<li><strong>Attention:<\/strong> Grouped-Query Attention (GQA) with 32 query heads and 4 KV heads<\/li>\n<li><strong>Sliding Window Attention (SWA):<\/strong> Applied to three of every four layers, with a window size of 1,024. Full attention runs on the remaining layer.<\/li>\n<li><strong>Context length:<\/strong> 131,072 tokens<\/li>\n<li><strong>Multi-Token Prediction (MTP) head:<\/strong> Serves as an auxiliary pre-training objective and as a built-in draft model for speculative decoding<\/li>\n<li><strong>Precision:<\/strong> bfloat16<\/li>\n<li><strong>Vocabulary size:<\/strong> 98,304<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">The model handles natural language and code. It is not multimodal \u2014 there is no image or video input.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Pre-Training<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Pre-training spans approximately 10.6 trillion tokens through a three-phase curriculum. The data mixture progressively shifts from diverse web content toward curated code and mathematical content across the three phases.<\/p>\n<p class=\"wp-block-paragraph\">Training used the Muon optimizer under FP8 hybrid precision with a Warmup-Hold-Decay learning rate schedule with linear decay to zero.<\/p>\n<p class=\"wp-block-paragraph\">After pre-training, the base model\u2019s context window was extended to 128K tokens using a layer-selective YaRN method before post-training began.<\/p>\n<h2 class=\"wp-block-heading\"><strong>The Model Family<\/strong><\/h2>\n<p class=\"wp-block-paragraph\"><strong>JetBrains team released six checkpoints covering the full training pipeline:<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Checkpoint<\/th>\n<th>Description<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Mellum2-12B-A2.5B-Base-Pretrain<\/td>\n<td>Base checkpoint before long-context extension<\/td>\n<\/tr>\n<tr>\n<td>Mellum2-12B-A2.5B-Base<\/td>\n<td>Final base model after context extension<\/td>\n<\/tr>\n<tr>\n<td>Mellum2-12B-A2.5B-Instruct-SFT<\/td>\n<td>Supervised fine-tuned instruction checkpoint<\/td>\n<\/tr>\n<tr>\n<td>Mellum2-12B-A2.5B-Thinking-SFT<\/td>\n<td>Supervised thinking checkpoint<\/td>\n<\/tr>\n<tr>\n<td>Mellum2-12B-A2.5B-Instruct<\/td>\n<td>RL-tuned instruction model<\/td>\n<\/tr>\n<tr>\n<td>Mellum2-12B-A2.5B-Thinking<\/td>\n<td>RL-tuned thinking model<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\">Post-training follows two stages: supervised fine-tuning (SFT), then reinforcement learning with verifiable rewards (RLVR) on math, executable coding, tool use, instruction following, reasoning, and knowledge tasks.<\/p>\n<p class=\"wp-block-paragraph\">The <strong>Instruct<\/strong> variant answers directly, without an externalized chain of thought. Use it for low-latency tasks: direct answers, tool use, and instruction following.<\/p>\n<p class=\"wp-block-paragraph\">The <strong>Thinking<\/strong> variant emits an explicit reasoning trace before its final answer. Use it for complex debugging, multi-step planning, or agentic flows where step-by-step reasoning matters.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Benchmark Results<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">All numbers below are self-reported by JetBrains. The comparison set is open-weight models in the 4B\u201314B range.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Coding:<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Benchmark<\/th>\n<th>Mellum2 Instruct<\/th>\n<th>Qwen3.5 (4B)<\/th>\n<th>Qwen3.5 (9B)<\/th>\n<th>Ministral 3 (14B)<\/th>\n<th>OLMo-3 (7B)<\/th>\n<th>Seed-Coder (8B)<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>LiveCodeBench v6<\/td>\n<td>37.2<\/td>\n<td>51.0<\/td>\n<td>63.7<\/td>\n<td>42.4<\/td>\n<td>28.2<\/td>\n<td>28.1<\/td>\n<\/tr>\n<tr>\n<td>EvalPlus<\/td>\n<td>78.4<\/td>\n<td>69.4<\/td>\n<td>71.8<\/td>\n<td>74.1<\/td>\n<td>67.3<\/td>\n<td>73.8<\/td>\n<\/tr>\n<tr>\n<td>MultiPL-E<\/td>\n<td>67.1<\/td>\n<td>51.0<\/td>\n<td>67.1<\/td>\n<td>71.5<\/td>\n<td>36.1<\/td>\n<td>77.0<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\"><strong>Tool Use:<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Benchmark<\/th>\n<th>Mellum2 Instruct<\/th>\n<th>Qwen3.5 (4B)<\/th>\n<th>Qwen3.5 (9B)<\/th>\n<th>Ministral 3 (14B)<\/th>\n<th>OLMo-3 (7B)<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>BFCL v3<\/td>\n<td>66.3<\/td>\n<td>64.1<\/td>\n<td>70.5<\/td>\n<td>52.7<\/td>\n<td>41.9<\/td>\n<\/tr>\n<tr>\n<td>BFCL v4<\/td>\n<td>44.2<\/td>\n<td>52.0<\/td>\n<td>60.6<\/td>\n<td>38.8<\/td>\n<td>19.8<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\"><strong>Math:<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Benchmark<\/th>\n<th>Mellum2 Instruct<\/th>\n<th>Qwen3.5 (4B)<\/th>\n<th>Qwen3.5 (9B)<\/th>\n<th>Ministral 3 (14B)<\/th>\n<th>OLMo-3 (7B)<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>AIME 2025+2026<\/td>\n<td>41.7<\/td>\n<td>38.3<\/td>\n<td>58.3<\/td>\n<td>33.3<\/td>\n<td>40.0<\/td>\n<\/tr>\n<tr>\n<td>GSM-Plus<\/td>\n<td>80.5<\/td>\n<td>85.2<\/td>\n<td>87.9<\/td>\n<td>86.6<\/td>\n<td>85.8<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\"><strong>Knowledge and Conversational:<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Benchmark<\/th>\n<th>Mellum2 Instruct<\/th>\n<th>Qwen3.5 (4B)<\/th>\n<th>Qwen3.5 (9B)<\/th>\n<th>Ministral 3 (14B)<\/th>\n<th>OLMo-3 (7B)<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>MMLU-Redux<\/td>\n<td>78.1<\/td>\n<td>87.5<\/td>\n<td>91.1<\/td>\n<td>85.9<\/td>\n<td>71.8<\/td>\n<\/tr>\n<tr>\n<td>GPQA Diamond<\/td>\n<td>40.9<\/td>\n<td>76.8<\/td>\n<td>79.8<\/td>\n<td>58.6<\/td>\n<td>40.9<\/td>\n<\/tr>\n<tr>\n<td>IFEval<\/td>\n<td>75.8<\/td>\n<td>82.1<\/td>\n<td>83.9<\/td>\n<td>67.3<\/td>\n<td>83.2<\/td>\n<\/tr>\n<tr>\n<td>MixEval<\/td>\n<td>62.2<\/td>\n<td>65.9<\/td>\n<td>71.1<\/td>\n<td>71.2<\/td>\n<td>59.4<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\"><strong>Benchmark notes:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li>EvalPlus is the mean of HumanEval+ and MBPP+<\/li>\n<li>AIME is the mean of AIME 2025 and AIME 2026 (30 questions each)<\/li>\n<li>BFCL v4 is the macro-average of five subtasks: v1, v2, v3, web search, memory<\/li>\n<li>Seed-Coder (8B) does not support native tool calling; BFCL scores are not listed for it<\/li>\n<\/ul>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"886\" data-attachment-id=\"80249\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/06\/02\/jetbrains-releases-mellum2-a-12b-moe-model-for-fast-specialized-tasks-in-multi-model-ai-pipelines\/screenshot-2026-06-02-at-1-02-55-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/06\/Screenshot-2026-06-02-at-1.02.55-AM-1.png\" data-orig-size=\"1682,1456\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;,&quot;alt&quot;:&quot;&quot;}\" data-image-title=\"Screenshot 2026-06-02 at 1.02.55\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/06\/Screenshot-2026-06-02-at-1.02.55-AM-1-1024x886.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/06\/Screenshot-2026-06-02-at-1.02.55-AM-1-1024x886.png\" alt=\"\" class=\"wp-image-80249\" \/><figcaption class=\"wp-element-caption\">https:\/\/blog.jetbrains.com\/ai\/2026\/06\/mellum2-goes-open-source-a-fast-model-for-ai-workflows\/<\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Use Cases<\/strong><\/h2>\n<p class=\"wp-block-paragraph\"><strong>JetBrains identifies four production scenarios where Mellum2\u2019s latency and efficiency profile is relevant:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Routing and orchestration<\/strong>: In a multi-model system, a router analyzes incoming prompts and selects the appropriate model or tool for each task. Mellum2\u2019s low per-token compute makes it suitable for this high-frequency classification step.<\/li>\n<li><strong>Low-latency RAG pipelines<\/strong>: Retrieval-Augmented Generation (RAG) systems retrieve relevant context, summarize it, and generate a response. Mellum2 handles retrieval summarization at lower latency than larger dense models.<\/li>\n<li><strong>Sub-agents in complex workflows<\/strong>: Agent pipelines break tasks into steps: context gathering, planning, validation, and execution. Mellum2 can handle repetitive or latency-sensitive steps instead of routing every step through a single large frontier model.<\/li>\n<li><strong>Private and local deployment<\/strong>: The Apache 2.0 license permits self-hosting without restrictions. Engineers can run Mellum2 on their own infrastructure, keeping code and data under their control.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>Strengths and Limitations<\/strong><\/h2>\n<h4 class=\"wp-block-heading\"><strong>Strengths:<\/strong><\/h4>\n<ul class=\"wp-block-list\">\n<li>MoE design activates only 2.5B of 12B parameters per token \u2014 per-token compute equivalent to a 2.5B dense model<\/li>\n<li>MTP head enables speculative decoding without a separate draft model<\/li>\n<li>131,072 token context window<\/li>\n<li>Full checkpoint set released: base pretrain, base, SFT, and RL-tuned variants for both Instruct and Thinking<\/li>\n<li>Apache 2.0 license \u2014 permits commercial use, self-hosting, and fine-tuning<\/li>\n<li>Strong EvalPlus (78.4) and BFCL v3 (66.3) scores relative to 4B\u201314B comparisons<\/li>\n<li>vLLM support, including optional tool-calling via <code>--tool-call-parser hermes<\/code><\/li>\n<\/ul>\n<h4 class=\"wp-block-heading\"><strong>Limitations:<\/strong><\/h4>\n<ul class=\"wp-block-list\">\n<li>Text and code only \u2014 no image or multimodal input<\/li>\n<li>LiveCodeBench v6 (37.2) trails Qwen3.5 9B (63.7) and Ministral 3 14B (42.4)<\/li>\n<li>GPQA Diamond (40.9) and MMLU-Redux (78.1) are below most models in the comparison set<\/li>\n<li>GSM-Plus (80.5) is below all comparable models listed<\/li>\n<li>Not designed for frontier-level tasks \u2014 JetBrains explicitly positions Mellum2 as a component model<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>Marktechpost\u2019s Visual Explainer<\/strong><\/h2>\n<div>\n<div class=\"mtp-slides\">\n<p>    <!-- Slide 1: Overview --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-badge blue\">Overview<\/span>\n<h2>JetBrains Open-Sources Mellum2<\/h2>\n<p class=\"mtp-sub\">A 12B Mixture-of-Experts model released under Apache 2.0 on June 2, 2026. Trained from scratch on ~10.6 trillion tokens for software engineering tasks.<\/p>\n<div class=\"mtp-divider\"><\/div>\n<div class=\"mtp-kv\">\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Total Params<\/div>\n<div class=\"kv-val\">12B<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Active \/ Token<\/div>\n<div class=\"kv-val\">2.5B<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">License<\/div>\n<div class=\"kv-val\">Apache 2.0<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Context<\/div>\n<div class=\"kv-val\">131,072 tok<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Architecture<\/div>\n<div class=\"kv-val\">MoE<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Pre-train Data<\/div>\n<div class=\"kv-val\">~10.6T tok<\/div>\n<\/div><\/div>\n<\/div>\n<p>    <!-- Slide 2: Architecture --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-badge purple\">Architecture<\/span>\n<h2>How Mellum2 Is Built<\/h2>\n<p class=\"mtp-sub\">MoE activates 8 of 64 experts per token \u2014 per-token compute stays equivalent to a 2.5B dense model. An MTP head enables speculative decoding without a separate draft model.<\/p>\n<div class=\"mtp-divider\"><\/div>\n<div class=\"mtp-kv\">\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Layers<\/div>\n<div class=\"kv-val\">28<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Hidden Size<\/div>\n<div class=\"kv-val\">2304<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Experts (total \/ active)<\/div>\n<div class=\"kv-val\">64 \/ 8<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">GQA Heads (Q \/ KV)<\/div>\n<div class=\"kv-val\">32 \/ 4<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">SWA Window<\/div>\n<div class=\"kv-val\">1,024 (\u00be layers)<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Vocabulary<\/div>\n<div class=\"kv-val\">98,304<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Precision<\/div>\n<div class=\"kv-val\">bfloat16<\/div>\n<\/div>\n<div class=\"mtp-kv-item\">\n<div class=\"kv-label\">Modality<\/div>\n<div class=\"kv-val\">Text + Code<\/div>\n<\/div><\/div>\n<\/div>\n<p>    <!-- Slide 3: Pre-Training --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-badge amber\">Pre-Training<\/span>\n<h2>Training Pipeline<\/h2>\n<p class=\"mtp-sub\">Three-phase curriculum progressively shifts from diverse web data toward curated code and math. Context extended to 128K via layer-selective YaRN before post-training.<\/p>\n<div class=\"mtp-divider\"><\/div>\n<ul class=\"mtp-list\">\n<li><span class=\"dot blue\"><\/span><strong>Data:<\/strong>\u00a0~10.6 trillion tokens across three curriculum phases<\/li>\n<li><span class=\"dot blue\"><\/span><strong>Optimizer:<\/strong>\u00a0Muon under FP8 hybrid precision<\/li>\n<li><span class=\"dot blue\"><\/span><strong>LR Schedule:<\/strong>\u00a0Warmup-Hold-Decay with linear decay to zero<\/li>\n<li><span class=\"dot blue\"><\/span><strong>Context Extension:<\/strong>\u00a0Layer-selective YaRN to 128K tokens<\/li>\n<li><span class=\"dot blue\"><\/span><strong>Post-Training:<\/strong>\u00a0SFT \u2192 RLVR on coding, math, tool use, reasoning, knowledge<\/li>\n<li><span class=\"dot blue\"><\/span><strong>Design Constraint:<\/strong>\u00a0Inference efficiency on commodity GPUs validated by ablation<\/li>\n<\/ul><\/div>\n<p>    <!-- Slide 4: Model Family --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-badge teal\">Model Family<\/span>\n<h2>Six Checkpoints Released<\/h2>\n<p class=\"mtp-sub\">Full pipeline from base pretrain through RL-tuned variants. Use Instruct for direct low-latency answers. Use Thinking for explicit step-by-step reasoning traces.<\/p>\n<div class=\"mtp-divider\"><\/div>\n<div class=\"mtp-family-row\">\n<div class=\"mtp-frow\"><span class=\"ftag base\">BASE<\/span><span class=\"fname\">Mellum2-12B-A2.5B-Base-Pretrain<\/span><span class=\"fdesc\">Before context extension<\/span><\/div>\n<div class=\"mtp-frow\"><span class=\"ftag base\">BASE<\/span><span class=\"fname\">Mellum2-12B-A2.5B-Base<\/span><span class=\"fdesc\">After YaRN extension<\/span><\/div>\n<div class=\"mtp-frow\"><span class=\"ftag sft\">SFT<\/span><span class=\"fname\">Mellum2-12B-A2.5B-Instruct-SFT<\/span><span class=\"fdesc\">Supervised instruction<\/span><\/div>\n<div class=\"mtp-frow\"><span class=\"ftag sft\">SFT<\/span><span class=\"fname\">Mellum2-12B-A2.5B-Thinking-SFT<\/span><span class=\"fdesc\">Supervised thinking<\/span><\/div>\n<div class=\"mtp-frow\"><span class=\"ftag rl\">RLVR<\/span><span class=\"fname\">Mellum2-12B-A2.5B-Instruct<\/span><span class=\"fdesc\">RL-tuned, no CoT<\/span><\/div>\n<div class=\"mtp-frow\"><span class=\"ftag rl\">RLVR<\/span><span class=\"fname\">Mellum2-12B-A2.5B-Thinking<\/span><span class=\"fdesc\">RL-tuned, explicit CoT<\/span><\/div>\n<\/div>\n<\/div>\n<p>    <!-- Slide 5: Benchmarks --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-badge blue\">Benchmarks<\/span>\n<h2>Evaluation Results (Instruct Variant)<\/h2>\n<p class=\"mtp-sub\">All numbers self-reported by JetBrains. Comparison set: open-weight models in the 4B\u201314B range.<\/p>\n<div class=\"mtp-divider\"><\/div>\n<div class=\"mtp-table-wrap\">\n<table class=\"mtp-table\">\n<thead>\n<tr>\n<th>Benchmark<\/th>\n<th>Mellum2<\/th>\n<th>Qwen3.5 9B<\/th>\n<th>Ministral 3 14B<\/th>\n<th>OLMo-3 7B<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>LiveCodeBench v6<\/td>\n<td>37.2<\/td>\n<td>63.7<\/td>\n<td>42.4<\/td>\n<td>28.2<\/td>\n<\/tr>\n<tr>\n<td>EvalPlus<\/td>\n<td class=\"mtp-hi\">78.4<\/td>\n<td>71.8<\/td>\n<td>74.1<\/td>\n<td>67.3<\/td>\n<\/tr>\n<tr>\n<td>MultiPL-E<\/td>\n<td class=\"mtp-hi\">67.1<\/td>\n<td>67.1<\/td>\n<td>71.5<\/td>\n<td>36.1<\/td>\n<\/tr>\n<tr>\n<td>BFCL v3<\/td>\n<td class=\"mtp-hi\">66.3<\/td>\n<td>70.5<\/td>\n<td>52.7<\/td>\n<td>41.9<\/td>\n<\/tr>\n<tr>\n<td>AIME 2025+2026<\/td>\n<td class=\"mtp-hi\">41.7<\/td>\n<td>58.3<\/td>\n<td>33.3<\/td>\n<td>40.0<\/td>\n<\/tr>\n<tr>\n<td>IFEval<\/td>\n<td>75.8<\/td>\n<td>83.9<\/td>\n<td>67.3<\/td>\n<td>83.2<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n<\/div>\n<p>    <!-- Slide 6: Use Cases --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-badge green\">Use Cases<\/span>\n<h2>Where Mellum2 Fits in Production<\/h2>\n<p class=\"mtp-sub\">JetBrains positions Mellum2 as a \u201cfocal model\u201d \u2014 handling high-frequency, latency-sensitive steps inside larger AI pipelines.<\/p>\n<div class=\"mtp-divider\"><\/div>\n<ul class=\"mtp-list\">\n<li><span class=\"dot green\"><\/span><strong>Routing &amp; Orchestration<\/strong> \u2014 Analyze prompts and select the right model or tool per task<\/li>\n<li><span class=\"dot green\"><\/span><strong>RAG Pipelines<\/strong> \u2014 Summarize retrieved context at low latency before response generation<\/li>\n<li><span class=\"dot green\"><\/span><strong>Sub-Agents<\/strong> \u2014 Handle repetitive steps in agent pipelines (context gathering, validation, planning)<\/li>\n<li><span class=\"dot green\"><\/span><strong>Private Deployment<\/strong> \u2014 Apache 2.0 permits full self-hosting with no external API calls required<\/li>\n<\/ul><\/div>\n<p>    <!-- Slide 7: Strengths &amp; Limitations --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-badge gray\">Strengths &amp; Limitations<\/span>\n<h2>What Works and What Doesn\u2019t<\/h2>\n<p class=\"mtp-sub\">Mellum2 is designed for efficiency in component roles, not frontier-level capability across all benchmarks.<\/p>\n<div class=\"mtp-divider\"><\/div>\n<div class=\"mtp-two-col\">\n<div class=\"mtp-col-box\">\n<h4 class=\"green\">\u2713 Strengths<\/h4>\n<ul class=\"mtp-list\">\n<li><span class=\"dot green\"><\/span>2.5B active params \u2014 compute of a dense 2.5B model<\/li>\n<li><span class=\"dot green\"><\/span>MTP head enables built-in speculative decoding<\/li>\n<li><span class=\"dot green\"><\/span>131K token context window<\/li>\n<li><span class=\"dot green\"><\/span>Strong EvalPlus (78.4) and BFCL v3 (66.3)<\/li>\n<li><span class=\"dot green\"><\/span>Apache 2.0 \u2014 commercial use, fine-tuning, self-hosting<\/li>\n<li><span class=\"dot green\"><\/span>vLLM support with tool-calling<\/li>\n<\/ul><\/div>\n<div class=\"mtp-col-box\">\n<h4 class=\"red\">\u2717 Limitations<\/h4>\n<ul class=\"mtp-list\">\n<li><span class=\"dot red\"><\/span>Text and code only \u2014 no multimodal input<\/li>\n<li><span class=\"dot red\"><\/span>LiveCodeBench v6 (37.2) below Qwen3.5 9B (63.7)<\/li>\n<li><span class=\"dot red\"><\/span>GPQA Diamond (40.9) below most comparisons<\/li>\n<li><span class=\"dot red\"><\/span>GSM-Plus (80.5) trails all models listed<\/li>\n<li><span class=\"dot red\"><\/span>Not a frontier replacement \u2014 component role only<\/li>\n<\/ul><\/div>\n<\/div>\n<\/div>\n<p>    <!-- Slide 8: Quick Start --><\/p>\n<div class=\"mtp-slide\">\n      <span class=\"mtp-badge teal\">Quick Start<\/span>\n<h2>Deploy with vLLM<\/h2>\n<p class=\"mtp-sub\">Install vLLM and serve the Instruct variant. Enable tool-calling with the hermes parser for function-calling workflows.<\/p>\n<div class=\"mtp-divider\"><\/div>\n<pre class=\"mtp-code\">pip install vllm\n\n# Basic serve\nvllm serve JetBrains\/Mellum2-12B-A2.5B-Instruct \n  --max-model-len 131072\n\n# With tool calling\nvllm serve JetBrains\/Mellum2-12B-A2.5B-Instruct \n  --max-model-len 131072 \n  --enable-auto-tool-choice \n  --tool-call-parser hermes<\/pre>\n<p>Model weights: <a href=\"https:\/\/huggingface.co\/collections\/JetBrains\/mellum-2\" target=\"_blank\" rel=\"noopener\">huggingface.co\/JetBrains\/mellum-2<\/a> \u00a0\u00b7\u00a0 Technical report: <a href=\"https:\/\/arxiv.org\/abs\/2605.31268\" target=\"_blank\" rel=\"noopener\">arXiv:2605.31268<\/a><\/p>\n<\/div>\n<\/div>\n<p><!-- \/mtp-slides --><\/p>\n<p>  <!-- Navigation --><\/p>\n<div class=\"mtp-nav\">\n    <button disabled>\u2190 Prev<\/button>\n<div class=\"mtp-dots\"><\/div>\n<p>    <button>Next \u2192<\/button>\n  <\/p><\/div>\n<p>  <!-- Footer \/ Marktechpost tagline --><\/p>\n<div class=\"mtp-footer\">\n    <span class=\"mtp-brand\">marktechpost.com<\/span><br \/>\n    <span class=\"mtp-tagline\">AI\/ML News &amp; Research \u00b7 1M+ Monthly Readers<\/span>\n  <\/div>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Getting Started<\/strong><\/h2>\n<p class=\"wp-block-paragraph\"><strong>Serve Mellum2 with vLLM:<\/strong><\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">pip install vllm\nvllm serve JetBrains\/Mellum2-12B-A2.5B-Instruct --max-model-len 131072<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\"><strong>With tool calling enabled:<\/strong><\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">vllm serve JetBrains\/Mellum2-12B-A2.5B-Instruct \n  --max-model-len 131072 \n  --enable-auto-tool-choice \n  --tool-call-parser hermes<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\"><strong>Using the Hugging Face Transformers library:<\/strong><\/p>\n<div class=\"dm-code-snippet dark dm-normal-version default no-background-mobile\">\n<div class=\"control-language\">\n<div class=\"dm-buttons\">\n<div class=\"dm-buttons-left\">\n<div class=\"dm-button-snippet red-button\"><\/div>\n<div class=\"dm-button-snippet orange-button\"><\/div>\n<div class=\"dm-button-snippet green-button\"><\/div>\n<\/div>\n<div class=\"dm-buttons-right\"><a><span class=\"dm-copy-text\">Copy Code<\/span><span class=\"dm-copy-confirmed\">Copied<\/span><span class=\"dm-error-message\">Use a different Browser<\/span><\/a><\/div>\n<\/div>\n<pre class=\" no-line-numbers\"><code class=\" no-wrap language-php\">from transformers import AutoTokenizer, AutoModelForCausalLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"JetBrains\/Mellum2-12B-A2.5B-Instruct\")\nmodel = AutoModelForCausalLM.from_pretrained(\"JetBrains\/Mellum2-12B-A2.5B-Instruct\")\n\nmessages = [{\"role\": \"user\", \"content\": \"Write a Python function to reverse a string.\"}]\ninputs = tokenizer.apply_chat_template(\n    messages,\n    add_generation_prompt=True,\n    tokenize=True,\n    return_dict=True,\n    return_tensors=\"pt\",\n).to(model.device)\n\noutputs = model.generate(**inputs, max_new_tokens=512)\nprint(tokenizer.decode(outputs[0][inputs[\"input_ids\"].shape[-1]:]))\n<\/code><\/pre>\n<\/div>\n<\/div>\n<p class=\"wp-block-paragraph\">\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<\/p><p class=\"wp-block-paragraph\">Check out\u00a0the <strong><a href=\"https:\/\/huggingface.co\/collections\/JetBrains\/mellum-2\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weights<\/a><\/strong> and\u00a0<strong><a href=\"https:\/\/blog.jetbrains.com\/ai\/2026\/06\/mellum2-goes-open-source-a-fast-model-for-ai-workflows\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/wbash1wF6efRj8G58\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/06\/02\/jetbrains-releases-mellum2-a-12b-moe-model-for-fast-specialized-tasks-in-multi-model-ai-pipelines\/\">JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>JetBrains released Mellum2, op&hellip;<\/p>\n","protected":false},"author":1,"featured_media":1020,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1019","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/1019","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1019"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/1019\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/1020"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1019"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1019"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1019"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}