{"id":722,"date":"2026-04-15T15:13:56","date_gmt":"2026-04-15T07:13:56","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=722"},"modified":"2026-04-15T15:13:56","modified_gmt":"2026-04-15T07:13:56","slug":"google-deepmind-releases-gemini-robotics-er-1-6-bringing-enhanced-embodied-reasoning-and-instrument-reading-to-physical-ai","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=722","title":{"rendered":"Google DeepMind Releases Gemini Robotics-ER 1.6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI"},"content":{"rendered":"<p>Google DeepMind research team introduced Gemini Robotics-ER 1.6, a significant upgrade to its embodied reasoning model designed to serve as the \u2018cognitive brain\u2019 of robots operating in real-world environments. The model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning, and success detection \u2014 acting as the high-level reasoning model for a robot, capable of executing tasks by natively calling tools like Google Search, vision-language-action models (VLAs), or any other third-party user-defined functions.<\/p>\n<p>Here is the key architectural idea to understand: Google DeepMind takes a dual-model approach to robotics AI. Gemini Robotics 1.5 is the vision-language-action (VLA) model \u2014 it processes visual inputs and user prompts and directly translates them into physical motor commands. Gemini Robotics-ER, on the other hand, is the embodied reasoning model: it specializes in understanding physical spaces, planning, and making logical decisions, but does not directly control robotic limbs. Instead, it provides high-level insights to help the VLA model decide what to do next. Think of it as the difference between a strategist and an executor \u2014 Gemini Robotics-ER 1.6 is the strategist.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1432\" height=\"942\" data-attachment-id=\"79031\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/15\/google-deepmind-releases-gemini-robotics-er-1-6-bringing-enhanced-embodied-reasoning-and-instrument-reading-to-physical-ai\/screenshot-2026-04-15-at-12-12-16-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-15-at-12.12.16-AM-1.png\" data-orig-size=\"1432,942\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-15 at 12.12.16\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-15-at-12.12.16-AM-1-1024x674.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-15-at-12.12.16-AM-1.png\" alt=\"\" class=\"wp-image-79031\" \/><figcaption class=\"wp-element-caption\">https:\/\/deepmind.google\/blog\/gemini-robotics-er-1-6\/?<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>What\u2019s New in Gemini Robotics-ER 1.6<\/strong><\/h3>\n<p>Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection. But the key addition is a capability that did not exist in prior versions at all: instrument reading.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Pointing as a Foundation for Spatial Reasoning<\/strong><\/h3>\n<p>Pointing \u2014 the model\u2019s ability to identify precise pixel-level locations in an image \u2014 is far more powerful than it sounds. Points can be used to express spatial reasoning (precision object detection and counting), relational logic (making comparisons such as identifying the smallest item in a set, or defining from-to relationships like \u2018move X to location Y\u2019), motion reasoning (mapping trajectories and identifying optimal grasp points), and constraint compliance (reasoning through complex prompts like \u201cpoint to every object small enough to fit inside the blue cup\u201d).<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1478\" height=\"1002\" data-attachment-id=\"79033\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/15\/google-deepmind-releases-gemini-robotics-er-1-6-bringing-enhanced-embodied-reasoning-and-instrument-reading-to-physical-ai\/screenshot-2026-04-15-at-12-12-53-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-15-at-12.12.53-AM-1.png\" data-orig-size=\"1478,1002\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-15 at 12.12.53\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-15-at-12.12.53-AM-1-1024x694.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-15-at-12.12.53-AM-1.png\" alt=\"\" class=\"wp-image-79033\" \/><figcaption class=\"wp-element-caption\">https:\/\/deepmind.google\/blog\/gemini-robotics-er-1-6\/?<\/figcaption><\/figure>\n<\/div>\n<p>In internal benchmarks, Gemini Robotics-ER 1.6 demonstrates a clear advantage over its predecessor. Gemini Robotics-ER 1.6 correctly identifies the number of hammers, scissors, paintbrushes, pliers, and garden tools in a scene, and does not point to requested items that are not present in the image \u2014 such as a wheelbarrow and Ryobi drill. In comparison, Gemini Robotics-ER 1.5 fails to identify the correct number of hammers or paintbrushes, misses scissors altogether, and hallucinates a wheelbarrow. For AI Robotics professionals this matters because hallucinated object detections in robotic pipelines can cause cascading downstream failures \u2014 a robot that \u2018sees\u2019 an object that isn\u2019t there will attempt to interact with empty space.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Success Detection and Multi-View Reasoning<\/strong><\/h3>\n<p>In robotics, knowing when a task is finished is just as important as knowing how to start it. Success detection serves as a critical decision-making engine that allows an agent to intelligently choose between retrying a failed attempt or progressing to the next stage of a plan.<\/p>\n<p>This is a harder problem than it looks. Most modern robotics setups include multiple camera views such as an overhead and wrist-mounted feed. This means a system needs to understand how different viewpoints combine to form a coherent picture at each moment and across time. Gemini Robotics-ER 1.6 advances multi-view reasoning, enabling it to better fuse information from multiple camera streams, even in occluded or dynamically changing environments.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Instrument Reading: A Real-World Breakthrough<\/strong><\/h3>\n<p>The genuinely new capability in Gemini Robotics-ER 1.6 is instrument reading \u2014 the ability to interpret analog gauges, pressure meters, sight glasses, and digital readouts in industrial settings. This task stems from facility inspection needs, a critical focus area for Boston Dynamics. Spot, a Boston Dynamics robot, is able to visit instruments throughout a facility and capture images of them for Gemini Robotics-ER 1.6 to interpret.<\/p>\n<p>Instrument reading requires complex visual reasoning: one must precisely perceive a variety of inputs \u2014 including the needles, liquid level, container boundaries, tick marks, and more \u2014 and understand how they all relate to each other. In the case of sight glasses, this involves estimating how much liquid fills the sightglass while accounting for distortion from the camera perspective. Gauges typically have text describing the unit, which must be read and interpreted, and some have multiple needles referring to different decimal places that need to be combined.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1370\" height=\"910\" data-attachment-id=\"79035\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/15\/google-deepmind-releases-gemini-robotics-er-1-6-bringing-enhanced-embodied-reasoning-and-instrument-reading-to-physical-ai\/screenshot-2026-04-15-at-12-13-21-am-2\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-15-at-12.13.21-AM-1.png\" data-orig-size=\"1370,910\" data-comments-opened=\"1\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-15 at 12.13.21\u202fAM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-15-at-12.13.21-AM-1-1024x680.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-15-at-12.13.21-AM-1.png\" alt=\"\" class=\"wp-image-79035\" \/><figcaption class=\"wp-element-caption\">https:\/\/deepmind.google\/blog\/gemini-robotics-er-1-6\/?<\/figcaption><\/figure>\n<\/div>\n<p>Gemini Robotics-ER 1.6 achieves its instrument readings by using agentic vision (a capability that combines visual reasoning with code execution, introduced with Gemini 3.0 Flash and extended in Gemini Robotics-ER 1.6). The model takes intermediate steps: first zooming into an image to get a better read of small details in a gauge, then using pointing and code execution to estimate proportions and intervals, and ultimately applying world knowledge to interpret meaning.<\/p>\n<p>Gemini Robotics-ER 1.5 achieves a 23% success rate on instrument reading, Gemini 3.0 Flash reaches 67%, Gemini Robotics-ER 1.6 reaches 86%, and Gemini Robotics-ER 1.6 with agentic vision hits 93%. One important caveat: Gemini Robotics-ER 1.5 was evaluated without agentic vision because it does not support that capability. The other three models were evaluated with agentic vision enabled for the instrument reading task, making the 23% baseline less a performance gap and more a fundamental architectural difference. For AI developers evaluating model generations, this distinction matters \u2014 you are not comparing apples to apples across the full benchmark column.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Gemini Robotics-ER 1.6 is a reasoning model, not an action model<\/strong>: It acts as the high-level \u2018brain\u2019 of a robot \u2014 handling spatial understanding, task planning, and success detection \u2014 while the separate VLA model (Gemini Robotics 1.5) handles the actual physical motor commands.<\/li>\n<li><strong>Pointing is more powerful than it looks<\/strong>: Gemini Robotics-ER 1.6\u2019s pointing capability goes far beyond simple object detection \u2014 it enables relational logic, motion trajectory mapping, grasp point identification, and constraint-based reasoning, all of which are foundational to reliable robotic manipulation.<\/li>\n<li><strong>Instrument reading is the biggest new capability<\/strong>: Built in collaboration with Boston Dynamics\u2019 Spot robot for industrial facility inspection, Gemini Robotics-ER 1.6 can now read analog gauges, pressure meters, and sight glasses with 93% accuracy using agentic vision \u2014 up from just 23% in Gemini Robotics-ER 1.5, which lacked the capability entirely.<\/li>\n<li><strong>Success detection is what enables true autonomy<\/strong>: Knowing when a task is actually complete \u2014 across multiple camera views, in occluded or dynamic environments \u2014 is what allows a robot to decide whether to retry or move to the next step without human intervention.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out\u00a0the<strong><a href=\"https:\/\/arxiv.org\/pdf\/2604.06425\" target=\"_blank\" rel=\"noreferrer noopener\">\u00a0<\/a><a href=\"https:\/\/deepmind.google\/blog\/gemini-robotics-er-1-6\/?\" target=\"_blank\" rel=\"noreferrer noopener\">Technical details<\/a> <\/strong>and<strong> <a href=\"https:\/\/deepmind.google\/models\/gemini-robotics\/\" target=\"_blank\" rel=\"noreferrer noopener\">Model Information<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">130k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?\u00a0<strong><a href=\"https:\/\/forms.gle\/MTNLpmJtsFA3VRVd9\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Connect with us<\/mark><\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/15\/google-deepmind-releases-gemini-robotics-er-1-6-bringing-enhanced-embodied-reasoning-and-instrument-reading-to-physical-ai\/\">Google DeepMind Releases Gemini Robotics-ER 1.6: Bringing Enhanced Embodied Reasoning and Instrument Reading to Physical AI<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Google DeepMind research team &hellip;<\/p>\n","protected":false},"author":1,"featured_media":723,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-722","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/722","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=722"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/722\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/723"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=722"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=722"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=722"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}