{"id":382,"date":"2026-02-09T15:46:46","date_gmt":"2026-02-09T07:46:46","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=382"},"modified":"2026-02-09T15:46:46","modified_gmt":"2026-02-09T07:46:46","slug":"meet-oat-the-new-action-tokenizer-bringing-llm-style-scaling-and-flexible-anytime-inference-to-the-robotics-world","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=382","title":{"rendered":"Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World"},"content":{"rendered":"<p>Robots are entering their GPT-3 era. For years, researchers have tried to train robots using the same autoregressive (AR) models that power large language models (LLMs). If a model can predict the next word in a sentence, it should be able to predict the next move for a robotic arm. However, a technical wall has blocked this progress: continuous robot movements are difficult to turn into discrete tokens.<\/p>\n<p>A team of researchers from Harvard University and Stanford University have released a new framework called <strong>Ordered Action Tokenization (OAT)<\/strong> to bridge this gap.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1844\" height=\"876\" data-attachment-id=\"77809\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/02\/08\/meet-oat-the-new-action-tokenizer-bringing-llm-style-scaling-and-flexible-anytime-inference-to-the-robotics-world\/screenshot-2026-02-08-at-11-39-58-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.39.58-PM-1.png\" data-orig-size=\"1844,876\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-02-08 at 11.39.58\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.39.58-PM-1-300x143.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.39.58-PM-1-1024x486.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.39.58-PM-1.png\" alt=\"\" class=\"wp-image-77809\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2602.04215<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Messy Reality of Robot Actions<\/strong><\/h3>\n<p>Tokenization turns complex data into a sequence of discrete numbers (tokens). For robots, these actions are continuous signals like joint angles. <strong>Previous strategies had fatal flaws:<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Binning:<\/strong> Turns every action dimension into a \u2018bin.\u2019 While simple, it creates massive sequences that make training and inference slow.<\/li>\n<li><strong>FAST (Frequency-space Action Sequence Tokenization):<\/strong> Uses math to compress movements into frequency coefficients. It is fast but often produces \u2018undecodable\u2019 sequences where small errors cause the robot to halt or move unpredictably. <\/li>\n<li><strong>Learned Latent Tokenizers:<\/strong> These use a learned \u2018dictionary\u2019 of movements. They are safe but lack a specific order, meaning the model treats early and late tokens as equally important.<\/li>\n<\/ul>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1918\" height=\"632\" data-attachment-id=\"77811\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/02\/08\/meet-oat-the-new-action-tokenizer-bringing-llm-style-scaling-and-flexible-anytime-inference-to-the-robotics-world\/screenshot-2026-02-08-at-11-41-08-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.41.08-PM-1.png\" data-orig-size=\"1918,632\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-02-08 at 11.41.08\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.41.08-PM-1-300x99.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.41.08-PM-1-1024x337.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.41.08-PM-1.png\" alt=\"\" class=\"wp-image-77811\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2602.04215<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>The Three Golden Rules of OAT<\/strong><\/h3>\n<p><strong>The research team identified 3 essential properties\u2014desiderata\u2014for a functional robot tokenizer:<\/strong><\/p>\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>High Compression (P.1):<\/strong> Token sequences must be short to keep models efficient. <\/li>\n<li><strong>Total Decodability (P.2):<\/strong> The decoder must be a total function, ensuring every possible token sequence maps to a valid movement. <\/li>\n<li><strong>Causal Ordering (P.3):<\/strong> Tokens must have a left-to-right structure where early tokens capture global motion and later tokens refine details. <\/li>\n<\/ol>\n<h3 class=\"wp-block-heading\"><strong>The Secret Sauce: Nested Dropout and Registers<\/strong><\/h3>\n<p>OAT uses a transformer encoder with <strong>register tokens<\/strong> to summarize action chunks. To force the model to learn \u2018important\u2019 things first, the research team used a innovative approach called <strong>Nested Dropout<\/strong>.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1892\" height=\"788\" data-attachment-id=\"77813\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/02\/08\/meet-oat-the-new-action-tokenizer-bringing-llm-style-scaling-and-flexible-anytime-inference-to-the-robotics-world\/screenshot-2026-02-08-at-11-42-46-pm-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.42.46-PM-1.png\" data-orig-size=\"1892,788\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-02-08 at 11.42.46\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.42.46-PM-1-300x125.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.42.46-PM-1-1024x426.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/02\/Screenshot-2026-02-08-at-11.42.46-PM-1.png\" alt=\"\" class=\"wp-image-77813\" \/><figcaption class=\"wp-element-caption\">https:\/\/arxiv.org\/pdf\/2602.04215<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Breaking the Benchmarks<\/strong><\/h3>\n<p>The research team tested OAT across 20+ tasks in 4 major simulation benchmarks. OAT consistently outperformed the industry-standard <strong>Diffusion Policy (DP)<\/strong> and previous tokenizers. <\/p>\n<h4 class=\"wp-block-heading\"><strong>Performance Results<\/strong><\/h4>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<td><strong>Benchmark<\/strong><\/td>\n<td><strong>OAT Success Rate<\/strong><\/td>\n<td><strong>DP Success Rate<\/strong><\/td>\n<td><strong>Bin Token Count<\/strong><\/td>\n<td><strong>OAT Token Count<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>LIBERO<\/strong><\/td>\n<td>56.3% <sup><\/sup><\/td>\n<td>36.6% <sup><\/sup><\/td>\n<td>224 <sup><\/sup><\/td>\n<td>8 <sup><\/sup><\/td>\n<\/tr>\n<tr>\n<td><strong>RoboMimic<\/strong><\/td>\n<td>73.1% <sup><\/sup><\/td>\n<td>67.1% <sup><\/sup><\/td>\n<td>224 <sup><\/sup><\/td>\n<td>8 <sup><\/sup><\/td>\n<\/tr>\n<tr>\n<td><strong>MetaWorld<\/strong><\/td>\n<td>24.4% <sup><\/sup><\/td>\n<td>19.3% <sup><\/sup><\/td>\n<td>128 <sup><\/sup><\/td>\n<td>8 <sup><\/sup><\/td>\n<\/tr>\n<tr>\n<td><strong>RoboCasa<\/strong><\/td>\n<td>54.6% <sup><\/sup><\/td>\n<td>54.0% <sup><\/sup><\/td>\n<td>384 <sup><\/sup><\/td>\n<td>8 <sup><\/sup><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<h3 class=\"wp-block-heading\"><strong>\u2018Anytime\u2019 Inference: Speed vs. Precision<\/strong><\/h3>\n<p>The most practical benefit of OAT is <strong>prefix-based detokenization<\/strong>. Since the tokens are ordered by importance, you can stop the model early.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Coarse Actions:<\/strong> Decoding just 1 or 2 tokens gives the robot a general direction quickly, which is useful for low-latency tasks.<\/li>\n<li><strong>Fine Actions:<\/strong> Generating all 8 tokens provides the high-precision details needed for complex insertions.<\/li>\n<\/ul>\n<p>This allows for a smooth trade-off between computation cost and action fidelity that previous fixed-length tokenizers could not offer.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Solving the Tokenization Gap:<\/strong> OAT addresses a fundamental limitation in applying autoregressive models to robotics by introducing a learned tokenizer that simultaneously achieves high compression, total decodability, and causal ordering.<\/li>\n<li><strong>Ordered Representation via Nested Dropout:<\/strong> By utilizing nested dropout during training, OAT forces the model to prioritize global, coarse motion patterns in early tokens while reserving later tokens for fine-grained refinements.<\/li>\n<li><strong>Total Decodability and Reliability:<\/strong> Unlike prior frequency-domain methods like FAST, OAT ensures the detokenizer is a total function, meaning every possible token sequence generates a valid action chunk, preventing runtime execution failures.<\/li>\n<li><strong>Flexible \u2018Anytime\u2019 Inference:<\/strong> The ordered structure enables prefix-based decoding, allowing robots to execute coarse actions from just one or two tokens to save computation or full eight-token sequences for high-precision tasks.<\/li>\n<li><strong>Superior Performance Across Benchmarks:<\/strong> Autoregressive policies equipped with OAT consistently outperform diffusion-based baselines and other tokenization schemes, achieving a 52.3% aggregate success rate and superior results in real-world \u2018Pick &amp; Place\u2019 and \u2018Stack Cups\u2019 tasks.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the\u00a0<strong><a href=\"https:\/\/arxiv.org\/pdf\/2602.04215\" target=\"_blank\" rel=\"noreferrer noopener\">Paper<\/a>, <a href=\"https:\/\/github.com\/Chaoqi-LIU\/oat\" target=\"_blank\" rel=\"noreferrer noopener\">Repo<\/a> and <a href=\"https:\/\/ordered-action-tokenization.github.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Project Page<\/a>.<\/strong>\u00a0Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">100k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/02\/08\/meet-oat-the-new-action-tokenizer-bringing-llm-style-scaling-and-flexible-anytime-inference-to-the-robotics-world\/\">Meet OAT: The New Action Tokenizer Bringing LLM-Style Scaling and Flexible, Anytime Inference to the Robotics World<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Robots are entering their GPT-&hellip;<\/p>\n","protected":false},"author":1,"featured_media":383,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-382","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/382","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=382"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/382\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/383"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=382"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=382"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=382"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}