{"id":530,"date":"2026-03-09T10:47:38","date_gmt":"2026-03-09T02:47:38","guid":{"rendered":"https:\/\/connectword.dpdns.org\/?p=530"},"modified":"2026-03-09T10:47:38","modified_gmt":"2026-03-09T02:47:38","slug":"andrej-karpathy-open-sources-autoresearch-a-630-line-python-tool-letting-ai-agents-run-autonomous-ml-experiments-on-single-gpus","status":"publish","type":"post","link":"https:\/\/connectword.dpdns.org\/?p=530","title":{"rendered":"Andrej Karpathy Open-Sources \u2018Autoresearch\u2019: A 630-Line Python Tool Letting AI Agents Run Autonomous ML Experiments on Single GPUs"},"content":{"rendered":"<p>Andrej Karpathy released <strong>autoresearch<\/strong>, a minimalist Python tool designed to enable AI agents to autonomously conduct machine learning experiments. The project is a stripped-down version of the <strong>nanochat<\/strong> LLM training core, condensed into a single-file repository of approximately ~<strong>630 lines of code<\/strong>. It is optimized for execution on a <strong>single NVIDIA GPU<\/strong>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>The Autonomous Iteration Loop<\/strong><\/h3>\n<p>The framework establishes a specific division of labor between the human researcher and the AI agent. The system operates on a continuous feedback loop where progress is tracked via git commits on a feature branch.<\/p>\n<figure class=\"wp-block-table is-style-stripes\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<td><strong>Component<\/strong><\/td>\n<td><strong>Responsibility<\/strong><\/td>\n<td><strong>File Format<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Human<\/strong><\/td>\n<td>Iterates on high-level research instructions and constraints.<\/td>\n<td><code>.md<\/code> (Markdown)<\/td>\n<\/tr>\n<tr>\n<td><strong>AI Agent<\/strong><\/td>\n<td>Proposes and implements modifications to the training script.<\/td>\n<td><code>.py<\/code> (Python)<\/td>\n<\/tr>\n<tr>\n<td><strong>Execution<\/strong><\/td>\n<td>Conducts a fixed-length training run to evaluate the changes.<\/td>\n<td>Shell\/Python<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p>The agent reads the human-provided instructions, modifies the training code\u2014adjusting neural network architecture, optimizers, or hyperparameters\u2014and executes a training run that lasts exactly <strong>five minutes<\/strong>.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Evaluation Metrics and Validation<\/strong><\/h3>\n<p>To ensure the agent only retains beneficial changes, the system uses <strong>bits-per-byte (BPB)<\/strong> as the primary validation metric. BPB measures the compression efficiency of the model on a validation dataset; a lower score indicates a more accurate model.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Validation Protocol:<\/strong> The agent only commits code changes to the git branch if the final BPB score is lower than the previous best.<\/li>\n<li><strong>Observed Performance:<\/strong> In initial runs, Karpathy demonstrated the agent successfully reducing validation loss from <strong>1.0 to 0.97<\/strong> BPB through autonomous code iteration.<\/li>\n<li><strong>Granularity:<\/strong> Every completed 5-minute training run is represented as a data point, allowing researchers to compare the effectiveness of different prompts or agent configurations over time.<\/li>\n<\/ul>\n<h3 class=\"wp-block-heading\"><strong>Case Study: Implementation by Shopify\u2019s Tobi Lutke<\/strong><\/h3>\n<p>Following the release, <a href=\"https:\/\/x.com\/tobi\/status\/2030771823151853938\" target=\"_blank\" rel=\"noreferrer noopener\">Shopify CEO Tobi Lutke adapted<\/a> the <code>autoresearch<\/code> framework for an internal project. By allowing the agent to iterate on a smaller model architecture, Lutke reported a <strong>19% improvement<\/strong> in validation scores. Notably, the agent-optimized smaller model eventually outperformed a larger model that had been configured through standard manual methods.<\/p>\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">\n<div class=\"embed-twitter\">\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">OK this thing is totally insane. Before going to bed I\u2026<\/p>\n<p>* used try to make a new qmdresearcher directory<br \/>* told my pi to read this github repo and make a version of that for the qmd query-expansion model with the goal of highest quality score and speed. Get training data from\u2026 <a href=\"https:\/\/t.co\/hbCfD62ElJ\">https:\/\/t.co\/hbCfD62ElJ<\/a><\/p>\n<p>\u2014 tobi lutke (@tobi) <a href=\"https:\/\/twitter.com\/tobi\/status\/2030771823151853938?ref_src=twsrc%5Etfw\">March 8, 2026<\/a><\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/figure>\n<p>Karpathy noted that the specific code tweaks discovered by the agent were later integrated back into his broader <strong>nanochat<\/strong> framework, demonstrating that the tool can discover optimizations applicable to larger-scale production systems.<\/p>\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\">\n<div class=\"wp-block-embed__wrapper\">\n<div class=\"embed-twitter\">\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">I packaged up the &#8220;autoresearch&#8221; project into a new self-contained minimal repo if people would like to play over the weekend. It&#8217;s basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then:<\/p>\n<p>\u2013 the human iterates on the\u2026 <a href=\"https:\/\/t.co\/3tyOq2P9c6\">pic.twitter.com\/3tyOq2P9c6<\/a><\/p>\n<p>\u2014 Andrej Karpathy (@karpathy) <a href=\"https:\/\/twitter.com\/karpathy\/status\/2030371219518931079?ref_src=twsrc%5Etfw\">March 7, 2026<\/a><\/p><\/blockquote>\n<\/div>\n<\/div>\n<\/figure>\n<h3 class=\"wp-block-heading\"><strong>Technical Significance for Devs<\/strong><\/h3>\n<p>For Devs, <code>autoresearch<\/code> represents a shift toward \u2018agentic\u2019 workflows in model development. Rather than manually tuning hyperparameters, the engineering task shifts to <strong>prompt engineering the agent<\/strong> to navigate the search space more effectively. The ~630-line constraint ensures that the entire codebase fits within the context window of modern LLMs, minimizing errors in code generation and allowing the agent to maintain a \u2018holistic\u2019 understanding of the training script.<\/p>\n<h3 class=\"wp-block-heading\"><strong>Key Takeaways<\/strong><\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Autonomous Research Loop:<\/strong> The framework enables AI agents to autonomously iterate on ML experiments by reading a human-provided <strong>Markdown (.md)<\/strong> instruction file and modifying a <strong>Python (.py)<\/strong> training script without manual intervention.<\/li>\n<li><strong>~630-Line Core:<\/strong> By stripping the <strong>nanochat<\/strong> LLM training core down to a single-file, ~630-line repository, the codebase is small enough to fit entirely within an LLM\u2019s context window, reducing code generation errors.<\/li>\n<li><strong>Efficiency-Driven Metrics:<\/strong> The agent runs fixed <strong>5-minute training sprints<\/strong> on a <strong>single NVIDIA GPU<\/strong> and only commits code changes to a git feature branch if they result in a lower <strong>bits-per-byte (BPB)<\/strong> validation score.<\/li>\n<li><strong>Proven Performance Gains:<\/strong> In a real-world test (as mentioned on a tweet), Shopify CEO Tobi Lutke used the tool to achieve a <strong>19% improvement<\/strong> in model scores, resulting in a smaller, agent-optimized model that outperformed a larger, manually configured one.<\/li>\n<li><strong>Shift in Engineering Focus:<\/strong> The project moves the developer\u2019s role from manual hyperparameter tuning to <strong>agent engineering<\/strong>, where the goal is to optimize the prompts that direct the AI to find the most efficient neural architectures and training settings.<\/li>\n<\/ul>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out <strong><a href=\"https:\/\/github.com\/karpathy\/autoresearch\" target=\"_blank\" rel=\"noreferrer noopener\">the Repo here<\/a>.\u00a0<\/strong>Also,\u00a0feel free to follow us on\u00a0<strong><a href=\"https:\/\/x.com\/intent\/follow?screen_name=marktechpost\" target=\"_blank\" rel=\"noreferrer noopener\"><mark>Twitter<\/mark><\/a><\/strong>\u00a0and don\u2019t forget to join our\u00a0<strong><a href=\"https:\/\/www.reddit.com\/r\/machinelearningnews\/\" target=\"_blank\" rel=\"noreferrer noopener\">120k+ ML SubReddit<\/a><\/strong>\u00a0and Subscribe to\u00a0<strong><a href=\"https:\/\/www.aidevsignals.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">our Newsletter<\/a><\/strong>. Wait! are you on telegram?\u00a0<strong><a href=\"https:\/\/t.me\/machinelearningresearchnews\" target=\"_blank\" rel=\"noreferrer noopener\">now you can join us on telegram as well.<\/a><\/strong><\/p>\n<p>The post <a href=\"https:\/\/www.marktechpost.com\/2026\/03\/08\/andrej-karpathy-open-sources-autoresearch-a-630-line-python-tool-letting-ai-agents-run-autonomous-ml-experiments-on-single-gpus\/\">Andrej Karpathy Open-Sources \u2018Autoresearch\u2019: A 630-Line Python Tool Letting AI Agents Run Autonomous ML Experiments on Single GPUs<\/a> appeared first on <a href=\"https:\/\/www.marktechpost.com\/\">MarkTechPost<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>Andrej Karpathy released autor&hellip;<\/p>\n","protected":false},"author":1,"featured_media":29,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-530","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/530","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=530"}],"version-history":[{"count":0,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/posts\/530\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=\/wp\/v2\/media\/29"}],"wp:attachment":[{"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=530"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=530"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/connectword.dpdns.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=530"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}