Andrzej Karpathy Open-sources ‘AutoSearch’: A 630-line Python tool that lets AI agents run autonomous ML experiments on a single GPU

by
0 comments
Andrzej Karpathy Open-sources 'AutoSearch': A 630-line Python tool that lets AI agents run autonomous ML experiments on a single GPU

Andrzej Karpathy was released self researchA minimal Python tool designed to enable AI agents to conduct machine learning experiments autonomously. This is a different version of the project nanochat LLM training core, condensed into a single-file repository of approximately ~630 lines of code. It is optimized for execution on a single nvidia gpu.

Autonomous Repetition Loop

The framework establishes a specific division of labor between the human researcher and the AI ​​agent. The system works on a continuous feedback loop where progress is tracked through Git commits on the feature branch.

Component responsibility file format
human being Iterates on high-level research directions and constraints. .md (markdown)
AI Agent Propose and implement amendments to training scripts. .py (Python)
execution Conducts training of a fixed duration to evaluate changes. Shell/Python

The agent reads human-supplied instructions, modifies the training code – adjusting the neural network architecture, optimizer, or hyperparameters – and executes a training run that runs exactly five minutes.

Evaluation metrics and validation

Uses systems to ensure that the agent retains only beneficial changes bits-per-byte (bpb) As the primary validation metric. BPB measures the compression efficiency of the model on a validation dataset; A lower score indicates a more accurate model.

  • Verification Protocol: The agent only commits code changes to the Git branch if the final BPB score is lower than the previous best.
  • Viewed performance: In the initial phase, Carpathy showed that the agent successfully minimized validation loss. 1.0 to 0.97 BPB through autonomous code iteration.
  • Granularity: Each completed 5-minute training is represented as a data point, allowing researchers to compare the effectiveness of different signals or agent configurations over time.

Case Study: Implementation by Toby Lütke of Shopify

After release, Adapted by Shopify CEO Toby Lutke autoresearch Outline for an internal project. Lütke explained that by allowing the agent to iterate on a smaller model architecture 19% improvement In verification score. Specifically, the agent-optimized smaller model ultimately outperformed a larger model that was configured through standard manual methods.

Carpathy noted that specific code tweaks discovered by the agent were later integrated into his broader nanochat The framework, shows that the tool can discover optimizations applicable to large-scale production systems.

technical importance for devs

For the gods, autoresearch Represents a shift towards ‘agent’ workflow in model development. Instead of manually tuning hyperparameters, the engineering work is shifted to prompt engineering agent To navigate the search space more effectively. The ~630-line constraint ensures that the entire codebase fits within the context window of a modern LLM, reducing errors in code generation and allowing the agent to maintain a ‘holistic’ understanding of the training script.

key takeaways

  • Autonomous Research Loop: This framework enables AI agents to autonomously iterate on ML experiments by reading human-provided Markdown (.md) Instruction file and modification a python (.py) Training scripts without manual intervention.
  • ~630-line core: by separating nanochat The LLM training core has been reduced to a single-file, ~630-line repository, with the codebase small enough to fit perfectly into the LLM’s context window, thereby reducing code generation errors.
  • Efficiency-Driven Metrics: agent runs as scheduled 5 minute training sprint on one single nvidia gpu And only commits code changes to the git feature branch if they result in a lower bits-per-byte (bpb) Verification Score.
  • Proven performance benefits: In real-world testing (as described in a tweet), Shopify CEO Toby Lutke used the tool to achieve 19% improvement in model scores, resulting in a smaller, agent-optimized model that outperformed the larger, manually configured model.
  • Change in Engineering Focus: Extends the project developer’s role beyond manual hyperparameter tuning agent engineeringWhere the goal is to optimize the signals that guide the AI ​​to find the most efficient neural architecture and training settings.

check it out repo here. Also, feel free to follow us Twitter And don’t forget to join us 120k+ ml subreddit and subscribe our newsletter. wait! Are you on Telegram? Now you can also connect with us on Telegram.


Related Articles

Leave a Comment