{"id":784,"date":"2026-04-24T06:11:30","date_gmt":"2026-04-23T22:11:30","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=784"},"modified":"2026-04-24T06:11:30","modified_gmt":"2026-04-23T22:11:30","slug":"openai-releases-gpt-5-5-a-fully-retrained-agentic-model-that-scores-82-7-on-terminal-bench-2-0-and-84-9-on-gdpval","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=784","title":{"rendered":"OpenAI Releases GPT-5.5, a Fully Retrained Agentic Model That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval"},"content":{"rendered":"<p>OpenAI has released GPT-5.5, its most capable model to date and the first fully retrained base model since GPT-4.5. GPT-5.5 is designed to complete complex, multi-step computer tasks with minimal human direction. Think of it as the difference between an assistant who needs a checklist and one who understands the underlying goal and figures out the steps themselves. The release is rolling out today to Plus, Pro, Business, and Enterprise subscribers across ChatGPT and Codex.<\/p>\n<h3 class=\"wp-block-heading\"><strong>What \u2018Agentic\u2019 Actually Means Here<\/strong><\/h3>\n<p>An agentic model doesn\u2019t just respond to a single prompt \u2014 it takes a sequence of actions, uses tools (like browsing the web, writing code, running scripts, or operating software), checks its own work, and keeps going until the task is finished. Prior models often stalled at handoff points, requiring the user to re-prompt or correct course. GPT-5.5 is built to reduce those interruptions.<\/p>\n<p>OpenAI launched GPT-5.5 as a model targeted at agentic computer use \u2014 it writes and debugs code, browses the web, fills out spreadsheets, and keeps working through multi-step tasks without requiring a human to supervise every move.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Four Domains Where Gains Are Concentrated<\/strong><\/h3>\n<p>The gains are concentrated in <strong>four areas<\/strong>: agentic coding, computer use, knowledge work, and early scientific research \u2014 domains OpenAI describes as those \u2018where progress depends on reasoning across context and taking action over time.\u2019<\/p>\n<p>For software engineers, the most immediately relevant benchmark is <strong>SWE-Bench Pro<\/strong>, which evaluates real-world GitHub issue resolution across four programming languages. GPT-5.5 resolves 58.6% of tasks end-to-end in a single pass. Worth noting: Claude Opus 4.7 scores higher at 64.3% on this same benchmark, though OpenAI has noted that Anthropic reported signs of memorization on a subset of those problems, which may affect the comparison.<\/p>\n<p>For long-horizon coding specifically, OpenAI also reports results on <strong>Expert-SWE<\/strong>, an internal benchmark measuring tasks with a median estimated human completion time of 20 hours. GPT-5.5 outperforms GPT-5.4 on Expert-SWE. This benchmark is significant because it reflects the kind of extended, multi-session engineering work \u2014 large refactors, feature builds, debugging deep in a codebase \u2014 that agentic tools are increasingly being asked to handle autonomously.<\/p>\n<p>Developers who tested the system early said GPT-5.5 has a better understanding of the \u201cshape\u201d of a software system, and can better understand why something is failing, where the fix is needed, and what else in the codebase would be affected.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"2242\" height=\"1072\" data-attachment-id=\"79250\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/23\/openai-releases-gpt-5-5-a-fully-retrained-agentic-model-that-scores-82-7-on-terminal-bench-2-0-and-84-9-on-gdpval\/screenshot-2026-04-23-at-2-57-37-pm\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-23-at-2.57.37-PM.png\" data-orig-size=\"2242,1072\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-23 at 2.57.37\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-23-at-2.57.37-PM-1024x490.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-23-at-2.57.37-PM.png\" alt=\"\" class=\"wp-image-79250\" \/><figcaption class=\"wp-element-caption\">https:\/\/openai.com\/index\/introducing-gpt-5-5\/<\/figcaption><\/figure>\n<\/div>\n<p>For ML engineers and data scientists who spend significant time in terminal environments orchestrating pipelines and debugging scripts, the <strong>Terminal-Bench 2.0<\/strong> results are the most compelling signal. GPT-5.5 scores 82.7% on Terminal-Bench 2.0, which tests complex command-line workflows requiring planning, iteration, and tool coordination \u2014 beating Claude Opus 4.7 at 69.4% and Gemini 3.1 Pro at 68.5%. That is not a marginal lead.<\/p>\n<p>For broader knowledge work, GPT-5.5 scores 84.9% on GDPval, which tests agents across 44 occupations of knowledge work. On <strong>OSWorld-Verified<\/strong>, a benchmark measuring whether a model can autonomously operate real computer environments, it reaches 78.7%.<\/p>\n<p>GPT-5.5 also ships with a Pro variant built for higher-accuracy, harder tasks. On BrowseComp, which tests a model\u2019s ability to track down hard-to-find information across the web, GPT-5.5 Pro scores 90.1%, ahead of Gemini 3.1 Pro at 85.9%. The model is also the top-ranked system on the Artificial Analysis Intelligence Index.<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1446\" height=\"1068\" data-attachment-id=\"79252\" data-permalink=\"https:\/\/www.marktechpost.com\/2026\/04\/23\/openai-releases-gpt-5-5-a-fully-retrained-agentic-model-that-scores-82-7-on-terminal-bench-2-0-and-84-9-on-gdpval\/screenshot-2026-04-23-at-3-10-58-pm\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-23-at-3.10.58-PM.png\" data-orig-size=\"1446,1068\" data-comments-opened=\"0\" data-image-meta='{\"aperture\":\"0\",\"credit\":\"\",\"camera\":\"\",\"caption\":\"\",\"created_timestamp\":\"0\",\"copyright\":\"\",\"focal_length\":\"0\",\"iso\":\"0\",\"shutter_speed\":\"0\",\"title\":\"\",\"orientation\":\"0\"}' data-image-title=\"Screenshot 2026-04-23 at 3.10.58\u202fPM\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-23-at-3.10.58-PM-1024x756.png\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2026\/04\/Screenshot-2026-04-23-at-3.10.58-PM.png\" alt=\"\" class=\"wp-image-79252\" \/><figcaption class=\"wp-element-caption\">https:\/\/openai.com\/index\/introducing-gpt-5-5\/<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\"><strong>Speed and Token Efficiency<\/strong><\/h3>\n<p>One concern with more capable models is that they tend to be slower or more expensive to run. OpenAI addressed this directly. GPT-5.5 matches GPT-5.4\u2019s per-token latency in real-world serving while performing better across nearly every evaluation measured. It also uses significantly fewer tokens to complete the same Codex tasks \u2014 meaning shorter, more efficient runs even on complex agentic workflows.<\/p>\n<p>On pricing, the standard GPT-5.5 API will be charged at $5 per million input tokens and $30 per million output tokens. For context, GPT-5.4 was priced at $2.50 per million input tokens and $15 per million output tokens \u2014 so the per-token price has doubled. OpenAI team argued that token efficiency gains offset the cost, since GPT-5.5 completes the same Codex tasks with fewer tokens, meaning cheaper runs overall even at the higher per-token rate. GPT-5.5 Pro, the higher-accuracy variant, is priced at $30 per million input tokens and $180 per million output tokens in the API.<\/p>\n<p>For teams running Codex at scale, the net math is what matters: if GPT-5.5 completes a task in materially fewer tokens than GPT-5.4, the effective cost per completed workflow can still come out lower despite the higher rate.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Scale and Adoption<\/strong><\/h3>\n<p>OpenAI has seen a surge in Codex usage, with about 4 million developers using the tool weekly. That scale matters for understanding the deployment context: GPT-5.5 is not a research preview but a production model being pushed to an active, large developer base immediately on launch.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>GPT-5.5 is OpenAI\u2019s first fully retrained base model since GPT-4.5<\/strong>, designed specifically for agentic workflows \u2014 it can understand complex goals, use tools, check its own work, and carry multi-step tasks through to completion with minimal human direction.<\/li>\n<li><strong>The biggest performance gains are in agentic coding, computer use, knowledge work, and early scientific research<\/strong> \u2014 GPT-5.5 scores 82.7% on Terminal-Bench 2.0, 84.9% on GDPval, and 78.7% on OSWorld-Verified, outperforming both Claude Opus 4.7 and Gemini 3.1 Pro on several key benchmarks.<\/li>\n<li><strong>GPT-5.5 matches GPT-5.4\u2019s per-token latency while being more capable across nearly every benchmark<\/strong> \u2014 it also uses significantly fewer tokens to complete the same Codex tasks, meaning better results without a proportional increase in speed or cost per completed workflow.<\/li>\n<li><strong>API pricing increases to $5\/M input tokens and $30\/M output tokens<\/strong> (up from $2.50 and $15 for GPT-5.4), with GPT-5.5 Pro priced at $30\/M input and $180\/M output \u2014 OpenAI team argues token efficiency gains offset the higher per-token rate for most workloads.<\/li>\n<li><strong>GPT-5.5 is rolling out today to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex<\/strong>, with approximately 4 million developers already using Codex weekly.<\/li>\n<\/ul>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/04\/23\/openai-releases-gpt-5-5-a-fully-retrained-agentic-model-that-scores-82-7-on-terminal-bench-2-0-and-84-9-on-gdpval\/\">OpenAI Releases GPT-5.5, a Fully Retrained Agentic Model That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>OpenAI has released GPT-5.5, i&hellip;<\/p>\n","protected":false},"author":1,"featured_media":785,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-784","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/784","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=784"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/784\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/785"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=784"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=784"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=784"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}