{"id":1007,"date":"2026-05-30T05:25:05","date_gmt":"2026-05-29T21:25:05","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=1007"},"modified":"2026-05-30T05:25:05","modified_gmt":"2026-05-29T21:25:05","slug":"stepfun-releases-step-3-7-flash-a-198b-moe-vision-language-model-for-coding-agents-and-search-workflows","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=1007","title":{"rendered":"StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows"},"content":{"rendered":"<p class=\"wp-block-paragraph\">StepFun today released <a href=\"https:\/\/github.com\/stepfun-ai\/Step-3.7-Flash\" target=\"_blank\" rel=\"noreferrer noopener\">Step 3.7 Flash<\/a>, a multimodal Mixture-of-Experts model targeting agentic use cases. It adds native vision input and improved tool-use reliability over Step 3.5 Flash.<\/p>\n<h2 class=\"wp-block-heading\"><strong>What is Step 3.7 Flash?<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">Step 3.7 Flash is a <strong>198B-parameter sparse Mixture-of-Experts (MoE) vision-language model<\/strong>. It pairs a <strong>196B-parameter language backbone<\/strong> with a <strong>1.8B-parameter vision encoder (ViT)<\/strong> for native image understanding.<\/p>\n<p class=\"wp-block-paragraph\">The model activates approximately <strong>11B parameters per token<\/strong> during inference. In MoE architectures, only a subset of \u201cexpert\u201d sub-networks fires per forward pass \u2014 not the full network. This keeps inference compute closer to an 11B dense model while maintaining a 198B total parameter budget.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Key specs:<\/strong><\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Spec<\/th>\n<th>Value<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Total parameters<\/td>\n<td>198B (196B language + 1.8B ViT)<\/td>\n<\/tr>\n<tr>\n<td>Active parameters per token<\/td>\n<td>~11B<\/td>\n<\/tr>\n<tr>\n<td>Context window<\/td>\n<td>256k tokens<\/td>\n<\/tr>\n<tr>\n<td>Throughput<\/td>\n<td>Up to 400 tokens\/sec<\/td>\n<\/tr>\n<tr>\n<td>Reasoning levels<\/td>\n<td>Low, medium, high<\/td>\n<\/tr>\n<tr>\n<td>License<\/td>\n<td>Apache 2.0<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<h2 class=\"wp-block-heading\"><strong>Architecture Notes<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">The vision encoder runs as a separate 1.8B ViT module. It injects image representations into the language backbone\u2019s context. Step 3.5 Flash had no multimodal support; this is a new addition in 3.7.<\/p>\n<p class=\"wp-block-paragraph\">Three selectable reasoning depths \u2014 low, medium, and high \u2014 let developers trade latency for reasoning depth. Low is faster and cheaper; high applies more computation per response.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Agentic Coding Performance<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">On <strong>SWE-Bench Pro<\/strong>, Step 3.7 Flash scores <strong>56.26%<\/strong>, up from Step 3.5 Flash\u2019s 51.3% \u2014 a gain of roughly 5 percentage points. On <strong>Terminal-Bench 2.1<\/strong>, it scores <strong>59.55%<\/strong>, up from 53.37%.<\/p>\n<p class=\"wp-block-paragraph\">On <strong>SWE-MTLG<\/strong> (a multi-task long-generation coding benchmark), it scores <strong>72.42%<\/strong>.<\/p>\n<p class=\"wp-block-paragraph\">Cross-harness consistency on StepFun\u2019s internal <strong>Step-SWE-Bench<\/strong>:<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Scaffold<\/th>\n<th>Step 3.7 Flash<\/th>\n<th>Step 3.5 Flash<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Hermes Agent<\/td>\n<td>67.5%<\/td>\n<td>60.0%<\/td>\n<\/tr>\n<tr>\n<td>OpenClaw<\/td>\n<td>67.0%<\/td>\n<td>47.0%<\/td>\n<\/tr>\n<tr>\n<td>KiloCode<\/td>\n<td>67.5%<\/td>\n<td>59.0%<\/td>\n<\/tr>\n<tr>\n<td>RooCode<\/td>\n<td>64.5%<\/td>\n<td>43.0%<\/td>\n<\/tr>\n<tr>\n<td>Claude Code<\/td>\n<td>71.5%<\/td>\n<td>73.0%<\/td>\n<\/tr>\n<tr>\n<td>OpenCode<\/td>\n<td>64.5%<\/td>\n<td>57.0%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\">Step 3.5 Flash ranged from 43% to 73% across harnesses. Step 3.7 Flash ranges from 64.5% to 71.5%. In production, coding agents often run inside heterogeneous scaffolds \u2014 each with its own prompting conventions and tool schemas. Narrower per-harness variance means more predictable behavior across different setups.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Advisor Mode<\/strong><\/h3>\n<p class=\"wp-block-paragraph\">Step 3.7 Flash supports <strong>Advisor Mode<\/strong>, StepFun\u2019s implementation of the advisor strategy described by Anthropic. The model runs the agentic loop end-to-end \u2014 calling tools, reading results, iterating \u2014 and escalates to a larger advisor model only at specific inflection points, such as planning or recovering from repeated failures. Most of the run stays at executor cost.<\/p>\n<p class=\"wp-block-paragraph\">With Advisor Mode enabled on SWE-Bench Verified, StepFun reports Step 3.7 Flash reaches <strong>97% of Claude Opus 4.6\u2019s coding performance at roughly one-ninth the per-task cost<\/strong> ($0.19 vs. $1.76 per task). These are StepFun\u2019s internal figures.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Multimodal Capabilities<\/strong><\/h2>\n<p class=\"wp-block-paragraph\"><strong>Step 3.7 Flash supports two visual tool pathways:<\/strong><\/p>\n<p class=\"wp-block-paragraph\"><strong>Visual Search Tool<\/strong> \u2014 For recognition tasks where the model\u2019s parametric knowledge is insufficient (long-tail entities, recently emerged concepts), it invokes a visual search tool to retrieve and verify. On <strong>SimpleVQA (with Search)<\/strong>, it scores <strong>79.16%<\/strong>, comparable to GPT 5.5 (79.11%) and above Kimi K2.6 (78.24%) and GLM 5V Turbo (78.20%).<\/p>\n<p class=\"wp-block-paragraph\"><strong>Python Tool<\/strong> \u2014 For fine-grained visual tasks (high-resolution images, visual probing, bounding-box analysis), it uses a code interface to crop, zoom, and draw pixels or bounding boxes. On <em>V<\/em> (a self-tested score with Python), it scores <strong>95.29%<\/strong>. On <strong>HR-Bench 4K<\/strong> and <strong>HR-Bench 8K<\/strong>, it scores <strong>89.13%<\/strong> and <strong>86.34%<\/strong> respectively.<\/p>\n<p class=\"wp-block-paragraph\">StepFun notes an observed behavior during testing: the model combined visual tools with non-visual tools without being explicitly trained to do so. For example, after generating frontend code, it used the GUI to render and inspect the result before iterating. StepFun describes this as emergent compositional tool use.<\/p>\n<p class=\"wp-block-paragraph\">On <strong>Android Daily<\/strong> (long-horizon phone UI task completion), Step 3.7 Flash scores <strong>61.87%<\/strong>, ahead of Kimi K2.6 (53.36%) and GLM 5V Turbo (51.68%). Gemini 3 Flash (63.21%) leads this benchmark.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Search and Research Benchmarks<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">StepFun focused this model\u2019s search design on planning, evidence filtering, and synthesis \u2014 integrating search as part of the reasoning loop rather than a separate add-on.<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Benchmark<\/th>\n<th>Step 3.7 Flash<\/th>\n<th>Notable comparison<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>HLE with Tools (acc)<\/td>\n<td>47.20%<\/td>\n<td>DeepSeek V4 Flash: 45.10%<\/td>\n<\/tr>\n<tr>\n<td>BrowseComp (acc)<\/td>\n<td>75.82%<\/td>\n<td>Claude Opus 4.7: 79.30%<\/td>\n<\/tr>\n<tr>\n<td>DeepSearchQA (F1)<\/td>\n<td>92.82%<\/td>\n<td>Kimi K2.6: 92.50%<\/td>\n<\/tr>\n<tr>\n<td>ResearchRubrics (score)<\/td>\n<td>71.68%<\/td>\n<td>GPT 5.5: 61.50%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\">Note: The HLE with Tools score of 47.20% compares to Step 3.5 Flash\u2019s text-only score of 35.68%. Step 3.5 Flash did not support tool-augmented evaluation on HLE.<\/p>\n<h2 class=\"wp-block-heading\"><strong>General Agent Benchmarks<\/strong><\/h2>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Benchmark<\/th>\n<th>Step 3.7 Flash<\/th>\n<th>Description<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Toolathlon<\/td>\n<td>49.51%<\/td>\n<td>Multi-tool coordination<\/td>\n<\/tr>\n<tr>\n<td>ClawEval-1.1<\/td>\n<td>67.07%<\/td>\n<td>Daily autonomous task execution in realistic environments<\/td>\n<\/tr>\n<tr>\n<td>GDPval (44 occupations)<\/td>\n<td>45.8%<\/td>\n<td>General professional task execution<\/td>\n<\/tr>\n<tr>\n<td>Tau2-bench Telecom<\/td>\n<td>&gt;98%<\/td>\n<td>Across different reasoning difficulty tiers<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\">On ClawEval-1.1, Step 3.7 Flash (67.07%) leads DeepSeek V4 Flash (57.80%) and DeepSeek V4 Pro (59.80%) among the compared models.<\/p>\n<h2 class=\"wp-block-heading\"><strong>Long-Context Performance<\/strong><\/h2>\n<p class=\"wp-block-paragraph\">On <strong>AA-LCR<\/strong> (a long-context retrieval benchmark, avg@16\/acc), Step 3.7 Flash scores <strong>63.94%<\/strong>. This is comparable to DeepSeek V4 Flash (63.70%) and DeepSeek V4 Pro (66.30%).<\/p>\n<h2 class=\"wp-block-heading\"><strong>Pricing<\/strong><\/h2>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Token Type<\/th>\n<th>Price<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Input (cache miss)<\/td>\n<td>$0.20 \/ M tokens<\/td>\n<\/tr>\n<tr>\n<td>Input (cache hit)<\/td>\n<td>$0.04 \/ M tokens<\/td>\n<\/tr>\n<tr>\n<td>Output<\/td>\n<td>$1.15 \/ M tokens<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<h2 class=\"wp-block-heading\"><strong>Marktechpost\u2019s Visual Explainer<\/strong><\/h2>\n<div>\n<p>  <!-- Header --><\/p>\n<div class=\"sf-header\">\n<div class=\"sf-tag\">Model Release<\/div>\n<div class=\"sf-title\">Step 3.7 Flash \u2014 A 198B MoE Vision-Language Model<\/div>\n<div class=\"sf-sub\">StepFun \u00b7 Released May 29, 2026 \u00b7 Apache 2.0<\/div>\n<\/div>\n<p>  <!-- Progress bar --><\/p>\n<div class=\"sf-progress\">\n<div class=\"sf-progress-bar\"><\/div>\n<\/div>\n<p>  <!-- Slides --><\/p>\n<div class=\"sf-slides\">\n<p>    <!-- Slide 1: What Is It --><\/p>\n<div class=\"sf-slide active\" data-slide=\"0\">\n<div class=\"sf-slide-label\">Slide 1 of 8 \u2014 Overview<\/div>\n<h3>What Is Step 3.7 Flash?<\/h3>\n<p>Step 3.7 Flash is a sparse <strong>Mixture-of-Experts (MoE)<\/strong> vision-language model from StepFun. It combines a 196B-parameter language backbone with a 1.8B-parameter Vision Transformer (ViT) encoder for native image understanding.<\/p>\n<p>In a MoE model, only a subset of \u201cexpert\u201d sub-networks activates per token \u2014 not the full network. This keeps inference compute close to an 11B dense model while maintaining 198B total parameters.<\/p>\n<div class=\"sf-kv\">\n<div class=\"sf-kv-item\">\n<div class=\"k\">Total Params<\/div>\n<div class=\"v\">198B<\/div>\n<\/div>\n<div class=\"sf-kv-item\">\n<div class=\"k\">Active \/ Token<\/div>\n<div class=\"v\">~11B<\/div>\n<\/div>\n<div class=\"sf-kv-item\">\n<div class=\"k\">Context Window<\/div>\n<div class=\"v\">256k tokens<\/div>\n<\/div>\n<div class=\"sf-kv-item\">\n<div class=\"k\">Throughput<\/div>\n<div class=\"v\">400 tok\/sec<\/div>\n<\/div>\n<div class=\"sf-kv-item\">\n<div class=\"k\">Reasoning Levels<\/div>\n<div class=\"v\">Low \/ Med \/ High<\/div>\n<\/div>\n<div class=\"sf-kv-item\">\n<div class=\"k\">License<\/div>\n<div class=\"v\">Apache 2.0<\/div>\n<\/div><\/div>\n<\/div>\n<p>    <!-- Slide 2: Architecture --><\/p>\n<div class=\"sf-slide\" data-slide=\"1\">\n<div class=\"sf-slide-label\">Slide 2 of 8 \u2014 Architecture<\/div>\n<h3>Architecture Notes<\/h3>\n<p>The 1.8B ViT encoder runs as a <strong>separate module<\/strong> and injects image representations into the language backbone\u2019s context. Step 3.5 Flash was text-only; native multimodal support is new in 3.7.<\/p>\n<div class=\"sf-step-line\"><\/div>\n<p><strong>Three selectable reasoning depths<\/strong> let developers balance speed and cost:<\/p>\n<ul class=\"sf-bullet\">\n<li><strong>Low<\/strong> \u2014 Fastest, cheapest. Suitable for simple completions.<\/li>\n<li><strong>Medium<\/strong> \u2014 Balanced cost and reasoning depth.<\/li>\n<li><strong>High<\/strong> \u2014 More compute per response. Best for complex agent tasks.<\/li>\n<\/ul>\n<div class=\"sf-note\">MoE routing means you pay for ~11B active params at inference, not 198B. This is the core efficiency trade-off in Flash-tier models.<\/div>\n<\/div>\n<p>    <!-- Slide 3: Agentic Coding --><\/p>\n<div class=\"sf-slide\" data-slide=\"2\">\n<div class=\"sf-slide-label\">Slide 3 of 8 \u2014 Agentic Coding<\/div>\n<h3>Agentic Coding Performance<\/h3>\n<p>Step 3.7 Flash scores <strong>56.26% on SWE-Bench Pro<\/strong> (up from 51.3% in 3.5 Flash) and <strong>59.55% on Terminal-Bench 2.1<\/strong> (up from 53.37%). On SWE-MTLG it scores <strong>72.42%<\/strong>.<\/p>\n<p>Per-harness scores on StepFun\u2019s internal Step-SWE-Bench:<\/p>\n<table class=\"sf-table\">\n<tr>\n<th>Scaffold<\/th>\n<th>3.7 Flash<\/th>\n<th>3.5 Flash<\/th>\n<\/tr>\n<tr>\n<td>Hermes Agent<\/td>\n<td class=\"hi\">67.5%<\/td>\n<td>60.0%<\/td>\n<\/tr>\n<tr>\n<td>OpenClaw<\/td>\n<td class=\"hi\">67.0%<\/td>\n<td>47.0%<\/td>\n<\/tr>\n<tr>\n<td>KiloCode<\/td>\n<td class=\"hi\">67.5%<\/td>\n<td>59.0%<\/td>\n<\/tr>\n<tr>\n<td>RooCode<\/td>\n<td class=\"hi\">64.5%<\/td>\n<td>43.0%<\/td>\n<\/tr>\n<tr>\n<td>Claude Code<\/td>\n<td class=\"hi\">71.5%<\/td>\n<td>73.0%<\/td>\n<\/tr>\n<tr>\n<td>OpenCode<\/td>\n<td class=\"hi\">64.5%<\/td>\n<td>57.0%<\/td>\n<\/tr>\n<\/table>\n<div class=\"sf-note\">3.5 Flash ranged 43\u201373% across harnesses. 3.7 Flash narrows that to 64.5\u201371.5% \u2014 more predictable across heterogeneous scaffolds.<\/div>\n<\/div>\n<p>    <!-- Slide 4: Advisor Mode --><\/p>\n<div class=\"sf-slide\" data-slide=\"3\">\n<div class=\"sf-slide-label\">Slide 4 of 8 \u2014 Advisor Mode<\/div>\n<h3>Advisor Mode<\/h3>\n<p>Step 3.7 Flash supports <strong>Advisor Mode<\/strong>, StepFun\u2019s implementation of the advisor strategy described by Anthropic. The model runs the full agentic loop \u2014 calling tools, reading results, iterating \u2014 and escalates to a larger advisor model only at specific inflection points.<\/p>\n<ul class=\"sf-bullet\">\n<li>Escalates during <strong>planning<\/strong> or recovery from repeated failures<\/li>\n<li>Most of the run stays at executor (Flash) cost<\/li>\n<li>Large advisor model is consulted sparingly<\/li>\n<\/ul>\n<div class=\"sf-step-line\"><\/div>\n<p><strong>SWE-Bench Verified results with Advisor Mode (StepFun internal figures):<\/strong><\/p>\n<div class=\"sf-kv\">\n<div class=\"sf-kv-item\">\n<div class=\"k\">Step 3.7 Flash + Advisor<\/div>\n<div class=\"v\">76.3% score<\/div>\n<\/div>\n<div class=\"sf-kv-item\">\n<div class=\"k\">Per-task cost<\/div>\n<div class=\"v\">$0.19<\/div>\n<\/div>\n<div class=\"sf-kv-item\">\n<div class=\"k\">Claude Opus 4.6<\/div>\n<div class=\"v\">78.7% score<\/div>\n<\/div>\n<div class=\"sf-kv-item\">\n<div class=\"k\">Claude Opus 4.6 cost<\/div>\n<div class=\"v\">$1.76<\/div>\n<\/div><\/div>\n<\/div>\n<p>    <!-- Slide 5: Multimodal --><\/p>\n<div class=\"sf-slide\" data-slide=\"4\">\n<div class=\"sf-slide-label\">Slide 5 of 8 \u2014 Multimodal<\/div>\n<h3>Multimodal Capabilities<\/h3>\n<p>Step 3.7 Flash supports two visual tool pathways:<\/p>\n<ul class=\"sf-bullet\">\n<li><strong>Visual Search Tool<\/strong> \u2014 Invoked for long-tail entity recognition or recently emerged concepts where parametric knowledge is insufficient. SimpleVQA (Search): <strong>79.16%<\/strong><\/li>\n<li><strong>Python Tool<\/strong> \u2014 Code interface for cropping, zooming, pixel\/bounding-box operations on high-resolution images. V* (Python): <strong>95.29%<\/strong> | HR-Bench 4K: <strong>89.13%<\/strong> | HR-Bench 8K: <strong>86.34%<\/strong><\/li>\n<\/ul>\n<div class=\"sf-step-line\"><\/div>\n<p><strong>Android Daily<\/strong> (long-horizon phone UI tasks): Step 3.7 Flash scores <strong>61.87%<\/strong>, ahead of Kimi K2.6 (53.36%) and GLM 5V Turbo (51.68%). Gemini 3 Flash leads at 63.21%.<\/p>\n<div class=\"sf-note\">StepFun reports emergent compositional tool use during testing \u2014 the model combined visual and non-visual tools without explicit training to do so.<\/div>\n<\/div>\n<p>    <!-- Slide 6: Search --><\/p>\n<div class=\"sf-slide\" data-slide=\"5\">\n<div class=\"sf-slide-label\">Slide 6 of 8 \u2014 Search &amp; Research<\/div>\n<h3>Search and Research Benchmarks<\/h3>\n<p>Search is integrated into the model\u2019s reasoning loop rather than treated as an external add-on. StepFun focused training on search planning, evidence filtering, and synthesis.<\/p>\n<table class=\"sf-table\">\n<tr>\n<th>Benchmark<\/th>\n<th>3.7 Flash<\/th>\n<th>Comparison<\/th>\n<\/tr>\n<tr>\n<td>HLE w. Tools (acc)<\/td>\n<td class=\"hi\">47.20%<\/td>\n<td>DeepSeek V4 Flash: 45.10%<\/td>\n<\/tr>\n<tr>\n<td>BrowseComp (acc)<\/td>\n<td class=\"hi\">75.82%<\/td>\n<td>Claude Opus 4.7: 79.30%<\/td>\n<\/tr>\n<tr>\n<td>DeepSearchQA (F1)<\/td>\n<td class=\"hi\">92.82%<\/td>\n<td>Kimi K2.6: 92.50%<\/td>\n<\/tr>\n<tr>\n<td>ResearchRubrics<\/td>\n<td class=\"hi\">71.68%<\/td>\n<td>GPT 5.5: 61.50%<\/td>\n<\/tr>\n<\/table>\n<div class=\"sf-note\">HLE comparison: Step 3.5 Flash scored 35.68% text-only. Step 3.7 Flash scores 47.20% with tool access \u2014 these are not apples-to-apples.<\/div>\n<\/div>\n<p>    <!-- Slide 7: Deployment --><\/p>\n<div class=\"sf-slide\" data-slide=\"6\">\n<div class=\"sf-slide-label\">Slide 7 of 8 \u2014 Deployment<\/div>\n<h3>Pricing, Deployment &amp; Ecosystem<\/h3>\n<table class=\"sf-table\">\n<tr>\n<th>Token Type<\/th>\n<th>Price<\/th>\n<\/tr>\n<tr>\n<td>Input (cache miss)<\/td>\n<td class=\"hi\">$0.20 \/ M tokens<\/td>\n<\/tr>\n<tr>\n<td>Input (cache hit)<\/td>\n<td class=\"hi\">$0.04 \/ M tokens<\/td>\n<\/tr>\n<tr>\n<td>Output<\/td>\n<td class=\"hi\">$1.15 \/ M tokens<\/td>\n<\/tr>\n<\/table>\n<div class=\"sf-step-line\"><\/div>\n<p><strong>Available on:<\/strong><\/p>\n<div class=\"sf-chip-row\">\n        <span class=\"sf-chip blue\">StepFun Platform<\/span><br \/>\n        <span class=\"sf-chip blue\">OpenRouter<\/span><br \/>\n        <span class=\"sf-chip blue\">NVIDIA NIM<\/span><br \/>\n        <span class=\"sf-chip\">DeepInfra (soon)<\/span><br \/>\n        <span class=\"sf-chip\">Fireworks AI (soon)<\/span><br \/>\n        <span class=\"sf-chip\">Modal (soon)<\/span>\n      <\/div>\n<div class=\"sf-step-line\"><\/div>\n<p><strong>Inference backends:<\/strong> vLLM, SGLang, Hugging Face Transformers (requires v5.0+), llama.cpp<\/p>\n<p><strong>Quantization formats:<\/strong> BF16, FP8, NVFP4, GGUF<\/p>\n<p><strong>Local minimum:<\/strong> 120 GB unified memory\/VRAM<\/p>\n<\/div>\n<p>    <!-- Slide 8: Key Takeaways --><\/p>\n<div class=\"sf-slide\" data-slide=\"7\">\n<div class=\"sf-slide-label\">Slide 8 of 8 \u2014 Key Takeaways<\/div>\n<h3>Key Takeaways<\/h3>\n<ul class=\"sf-bullet\">\n<li>198B sparse MoE model with ~11B active params per token and a 256k context window<\/li>\n<li>Native multimodal support (images, GUIs, documents) \u2014 Step 3.5 Flash was text-only<\/li>\n<li>Advisor Mode scores 76.3% on SWE-Bench Verified at $0.19\/task vs. Claude Opus 4.6 at $1.76<\/li>\n<li>Cross-harness coding variance narrowed from 43\u201373% (3.5) to 64.5\u201371.5% (3.7)<\/li>\n<li>Released Apache 2.0 with BF16, FP8, NVFP4, and GGUF weights on Hugging Face<\/li>\n<\/ul>\n<div class=\"sf-step-line\"><\/div>\n<p><strong>Compatible harnesses:<\/strong><\/p>\n<div class=\"sf-chip-row\">\n        <span class=\"sf-chip blue\">Claude Code<\/span><br \/>\n        <span class=\"sf-chip blue\">KiloCode<\/span><br \/>\n        <span class=\"sf-chip blue\">Hermes Agent<\/span><br \/>\n        <span class=\"sf-chip blue\">OpenClaw<\/span>\n      <\/div>\n<\/div>\n<\/div>\n<p>  <!-- Nav --><\/p>\n<div class=\"sf-nav\">\n    <button class=\"sf-btn-ghost\" disabled>\u2190 Prev<\/button>\n<div>\n<div class=\"sf-nav-dots\"><\/div>\n<div class=\"sf-counter\">1 \/ 8<\/div>\n<\/div>\n<p>    <button class=\"sf-btn\">Next \u2192<\/button>\n  <\/p><\/div>\n<\/div>\n<h2 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h2>\n<ul class=\"wp-block-list\">\n<li>Step 3.7 Flash is a 198B sparse MoE model with 11B active params and a 256k context window.<\/li>\n<li>Native multimodal support (images, GUIs, documents) is new \u2014 Step 3.5 Flash was text-only.<\/li>\n<li>Advisor Mode reaches 97% of Claude Opus 4.6&#8217;s SWE-Bench Verified performance at $0.19 per task vs. $1.76.<\/li>\n<li>Cross-harness coding variance narrowed from a 43\u201373% range (3.5 Flash) to 64.5\u201371.5% (3.7 Flash).<\/li>\n<li>Released under Apache 2.0 with BF16, FP8, NVFP4, and GGUF weights on Hugging Face.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\"><strong>Where (Inferences) to Run<\/strong> <strong>Step 3.7 Flash<\/strong><\/h2>\n<div>\n<div class=\"si-head\">\n<div class=\"si-tag\">Where to Run It<\/div>\n<div class=\"si-title\">Step 3.7 Flash \u2014 Inference Providers &amp; Access<\/div>\n<div class=\"si-sub\">StepFun\u2019s 198B MoE vision-language model across hosted APIs and open weights.<\/div>\n<\/div>\n<div class=\"si-section-label\">Hosted API \u00b7 Live Now<\/div>\n<div class=\"si-grid\">\n<p>    <a class=\"si-card\" href=\"https:\/\/platform.stepfun.ai\/\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n<div class=\"si-card-top\">\n        <span class=\"si-card-name\">StepFun Platform (Global)<\/span><br \/>\n        <span class=\"si-badge live\">Live<\/span>\n      <\/div>\n<div class=\"si-card-desc\">Official API. Base URL: api.stepfun.ai\/v1. Model ID: step-3.7-flash.<\/div>\n<div class=\"si-card-url\">platform.stepfun.ai <span class=\"si-arrow\">\u2192<\/span><\/div>\n<p>    <\/p>\n<p>    <a class=\"si-card\" href=\"https:\/\/platform.stepfun.com\/\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n<div class=\"si-card-top\">\n        <span class=\"si-card-name\">StepFun Platform (China)<\/span><br \/>\n        <span class=\"si-badge live\">Live<\/span>\n      <\/div>\n<div class=\"si-card-desc\">China region API. Base URL: api.stepfun.com\/v1. Requires +86 verification.<\/div>\n<div class=\"si-card-url\">platform.stepfun.com <span class=\"si-arrow\">\u2192<\/span><\/div>\n<p>    <\/p>\n<p>    <a class=\"si-card\" href=\"https:\/\/openrouter.ai\/stepfun\/step-3.7-flash\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n<div class=\"si-card-top\">\n        <span class=\"si-card-name\">OpenRouter<\/span><br \/>\n        <span class=\"si-badge live\">Live<\/span>\n      <\/div>\n<div class=\"si-card-desc\">Unified API. $0.20\/M input, $1.15\/M output. Reasoning parameter supported.<\/div>\n<div class=\"si-card-url\">openrouter.ai\/stepfun\/step-3.7-flash <span class=\"si-arrow\">\u2192<\/span><\/div>\n<p>    <\/p>\n<p>    <a class=\"si-card\" href=\"https:\/\/build.nvidia.com\/\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n<div class=\"si-card-top\">\n        <span class=\"si-card-name\">NVIDIA NIM<\/span><br \/>\n        <span class=\"si-badge live\">Day-0<\/span>\n      <\/div>\n<div class=\"si-card-desc\">GPU-accelerated endpoints. Containerized microservice for on-prem, cloud, or hybrid.<\/div>\n<div class=\"si-card-url\">build.nvidia.com <span class=\"si-arrow\">\u2192<\/span><\/div>\n<p>    <\/p><\/div>\n<div class=\"si-section-label\">Open Weights \u00b7 Apache 2.0<\/div>\n<div class=\"si-grid\">\n<p>    <a class=\"si-card\" href=\"https:\/\/huggingface.co\/stepfun-ai\/Step-3.7-Flash\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n<div class=\"si-card-top\">\n        <span class=\"si-card-name\">Hugging Face<\/span><br \/>\n        <span class=\"si-badge self\">Weights<\/span>\n      <\/div>\n<div class=\"si-card-desc\">Download BF16, FP8, NVFP4, and GGUF checkpoints for self-hosting.<\/div>\n<div class=\"si-card-url\">huggingface.co\/stepfun-ai\/Step-3.7-Flash <span class=\"si-arrow\">\u2192<\/span><\/div>\n<p>    <\/p>\n<p>    <a class=\"si-card\" href=\"https:\/\/github.com\/stepfun-ai\/Step-3.7-Flash\" target=\"_blank\" rel=\"noopener\"><\/a><\/p>\n<div class=\"si-card-top\">\n        <span class=\"si-card-name\">GitHub<\/span><br \/>\n        <span class=\"si-badge self\">Repo<\/span>\n      <\/div>\n<div class=\"si-card-desc\">Model code and deployment guides for vLLM, SGLang, and llama.cpp.<\/div>\n<div class=\"si-card-url\">github.com\/stepfun-ai\/Step-3.7-Flash <span class=\"si-arrow\">\u2192<\/span><\/div>\n<p>    <\/p><\/div>\n<div class=\"si-foot\">\n    Sources: StepFun model page, Hugging Face, GitHub, OpenRouter, NVIDIA Technical Blog. Accurate as of May 29, 2026.\n  <\/div>\n<\/div>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p class=\"wp-block-paragraph\">\n<\/p><p class=\"wp-block-paragraph\">Check out\u00a0the\u00a0<strong><a href=\"https:\/\/huggingface.co\/stepfun-ai\/Step-3.7-Flash\" target=\"_blank\" rel=\"noreferrer noopener\">Model Weights<\/a><\/strong>, <strong><a href=\"https:\/\/github.com\/stepfun-ai\/Step-3.7-Flash\" target=\"_blank\" rel=\"noreferrer noopener\">Repo<\/a>\u00a0<\/strong>and<strong>\u00a0<a href=\"https:\/\/static.stepfun.com\/blog\/step-3.7-flash\/\" target=\"_blank\" rel=\"noreferrer noopener\">Technical Details<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">150k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p class=\"wp-block-paragraph\">Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/wbash1wF6efRj8G58\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/05\/29\/stepfun-releases-step-3-7-flash-a-198b-moe-vision-language-model-for-coding-agents-and-search-workflows\/\">StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>StepFun today released Step 3.&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1007","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/1007","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1007"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/1007\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1007"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1007"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1007"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}