{"id":439,"date":"2026-02-20T14:51:30","date_gmt":"2026-02-20T06:51:30","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=439"},"modified":"2026-02-20T14:51:30","modified_gmt":"2026-02-20T06:51:30","slug":"nvidia-releases-dynamo-v0-9-0-a-massive-infrastructure-overhaul-featuring-flashindexer-multi-modal-support-and-removed-nats-and-etcd","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=439","title":{"rendered":"NVIDIA Releases Dynamo v0.9.0: A Massive Infrastructure Overhaul Featuring FlashIndexer, Multi-Modal Support, and Removed NATS and ETCD"},"content":{"rendered":"<p>NVIDIA has just released <strong>Dynamo v0.9.0<\/strong>. This is the most significant infrastructure upgrade for the distributed inference framework to date. This update simplifies how large-scale models are deployed and managed. The release focuses on removing heavy dependencies and improving how GPUs handle multi-modal data.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Great Simplification: Removing NATS and etcd<\/strong><\/h3>\n<p>The biggest change in v0.9.0 is the removal of <strong>NATS<\/strong> and <strong>ETCD<\/strong>. In previous versions, these tools handled service discovery and messaging. However, they added \u2018operational tax\u2019 by requiring developers to manage extra clusters.<\/p>\n<p>NVIDIA replaced these with a new <strong>Event Plane<\/strong> and a <strong>Discovery Plane<\/strong>. The system now uses <strong>ZMQ (ZeroMQ)<\/strong> for high-performance transport and <strong>MessagePack<\/strong> for data serialization. For teams using Kubernetes, Dynamo now supports <strong>Kubernetes-native service discovery<\/strong>. This change makes the infrastructure leaner and easier to maintain in production environments.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Multi-Modal Support and the E\/P\/D Split<\/strong><\/h3>\n<p>Dynamo v0.9.0 expands multi-modal support across 3 main backends: <strong>vLLM<\/strong>, <strong>SGLang<\/strong>, and <strong>TensorRT-LLM<\/strong>. This allows models to process text, images, and video more efficiently.<\/p>\n<p>A key feature in this update is the <strong>E\/P\/D (Encode\/Prefill\/Decode) split<\/strong>. In standard setups, a single GPU often handles all 3 stages. This can cause bottlenecks during heavy video or image processing. v0.9.0 introduces <strong>Encoder Disaggregation<\/strong>. You can now run the <strong>Encoder<\/strong> on a separate set of GPUs from the <strong>Prefill<\/strong> and <strong>Decode<\/strong> workers. This allows you to scale your hardware based on the specific needs of your model.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Sneak Preview: FlashIndexer<\/strong><\/h3>\n<p>This release includes a sneak preview of <strong>FlashIndexer<\/strong>. This component is designed to solve latency issues in distributed <strong>KV cache<\/strong> management.<\/p>\n<p>When working with large context windows, moving Key-Value (KV) data between GPUs is a slow process. FlashIndexer improves how the system indexes and retrieves these cached tokens. This results in a lower <strong>Time to First Token (TTFT)<\/strong>. While still a preview, it represents a major step toward making distributed inference feel as fast as local inference.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Smart Routing and Load Estimation<\/strong><\/h3>\n<p>Managing traffic across 100s of GPUs is difficult. Dynamo v0.9.0 introduces a smarter <strong>Planner<\/strong> that uses <strong>predictive load estimation<\/strong>.<\/p>\n<p>The system uses a <strong>Kalman filter<\/strong> to predict the future load of a request based on past performance. It also supports <strong>routing hints<\/strong> from the <strong>Kubernetes Gateway API Inference Extension (GAIE)<\/strong>. This allows the network layer to communicate directly with the inference engine. If a specific GPU group is overloaded, the system can route new requests to idle workers with higher precision.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Technical Stack at a Glance<\/strong><\/h3>\n<p>The v0.9.0 release updates several core components to their latest stable versions. Here is the breakdown of the supported backends and libraries:<\/p>\n<figure class=\"wp-block-table is-style-stripes\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<td><strong>Component<\/strong><\/td>\n<td><strong>Version<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>vLLM<\/strong><\/td>\n<td>v0.14.1<\/td>\n<\/tr>\n<tr>\n<td><strong>SGLang<\/strong><\/td>\n<td>v0.5.8<\/td>\n<\/tr>\n<tr>\n<td><strong>TensorRT-LLM<\/strong><\/td>\n<td>v1.3.0rc1<\/td>\n<\/tr>\n<tr>\n<td><strong>NIXL<\/strong><\/td>\n<td>v0.9.0<\/td>\n<\/tr>\n<tr>\n<td><strong>Rust Core<\/strong><\/td>\n<td>dynamo-tokens crate<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>The inclusion of the <strong>dynamo-tokens<\/strong> crate, written in <strong>Rust<\/strong>, ensures that token handling remains high-speed. For data transfer between GPUs, Dynamo continues to leverage <strong>NIXL (NVIDIA Inference Transfer Library)<\/strong> for <strong>RDMA-based<\/strong> communication.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Infrastructure Decoupling (Goodbye NATS and ETCD)<\/strong>: The release completes the modernization of the communication architecture. By replacing NATS and ETCD with a new <strong>Event Plane<\/strong> (using <strong>ZMQ<\/strong> and <strong>MessagePack<\/strong>) and <strong>Kubernetes-native service discovery<\/strong>, the system removes the \u2018operational tax\u2019 of managing external clusters.<\/li>\n<li><strong>Full Multi-Modal Disaggregation (E\/P\/D Split)<\/strong>: Dynamo now supports a complete <strong>Encode\/Prefill\/Decode (E\/P\/D)<\/strong> split across all 3 backends (vLLM, SGLang, and TRT-LLM). This allows you to run vision or video encoders on separate GPUs, preventing compute-heavy encoding tasks from bottlenecking the text generation process.<\/li>\n<li><strong>FlashIndexer Preview for Lower Latency<\/strong> :The \u2018sneak preview\u2019 of <strong>FlashIndexer<\/strong> introduces a specialized component to optimize <strong>distributed KV cache<\/strong> management. It is designed to make the indexing and retrieval of conversation \u2018memory\u2019 significantly faster, aimed at further reducing the Time to First Token (TTFT).<\/li>\n<li><strong>Smarter Scheduling with Kalman Filters<\/strong>: The system now uses <strong>predictive load estimation<\/strong> powered by <strong>Kalman filters<\/strong>. This allows the Planner to forecast GPU load more accurately and handle traffic spikes proactively, supported by <strong>routing hints<\/strong> from the Kubernetes Gateway API Inference Extension (GAIE).<\/li>\n<\/ol>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/github.com\/ai-dynamo\/dynamo\/releases\/tag\/v0.9.0\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub Release here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/02\/19\/nvidia-releases-dynamo-v0-9-0-a-massive-infrastructure-overhaul-featuring-flashindexer-multi-modal-support-and-removed-nats-and-etcd\/\">NVIDIA Releases Dynamo v0.9.0: A Massive Infrastructure Overhaul Featuring FlashIndexer, Multi-Modal Support, and Removed NATS and ETCD<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>NVIDIA has just released Dynam&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-439","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/439","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=439"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/439\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=439"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=439"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=439"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}