ml-intern
Hugging Face's open-source ML engineer agent — autonomously reads papers, trains models, and ships ML code with first-party access to HF docs, datasets, jobs, and papers. Bring your own LLM (Anthropic, OpenAI, etc.).
git clone https://github.com/huggingface/ml-intern.git && cd ml-intern && uv sync && uv tool install -e . What it does
Hugging Face’s ml-intern is an agentic CLI that does the routine ML-engineering work end-to-end. You install it once, then run prompts like ml-intern "fine-tune llama on my dataset" either interactively or headlessly. Under the hood it’s an agent loop (max 300 iterations) wrapping any LLM (Claude, GPT, etc. via litellm), pre-wired with first-party tools for the Hugging Face ecosystem: HF docs and research search, repos, datasets, jobs, papers, GitHub code search, sandbox + local code execution, planning, and MCP-server passthrough.
Auto-compaction kicks in around 170k tokens. A “Doom Loop Detector” catches repeated tool-call patterns and injects corrective prompts. Sessions auto-upload to your own private HF dataset in Claude Code JSONL format, browsable via HF’s Agent Trace Viewer (you can flip the dataset to public or opt out entirely).
Who it’s for
- Machine learning engineers in bio/pharma doing model-development work where you’d otherwise be reading HF docs, copying example notebooks, and stitching together training jobs by hand.
- Industry research scientists who want to compress the iteration cycle on a new model from “read three papers, fork a notebook, debug for two days” to “describe the goal, watch it run.”
- Career-switchers moving into ML who want a working scaffold for the HF ecosystem rather than learning each piece in isolation.
How this differs from the AI Research Pipeline plugin
ml-intern and the AI Research Pipeline plugin sit on the same shelf but solve different problems:
ml-intern (Hugging Face) | AI Research Pipeline (Vera) | |
|---|---|---|
| Primary output | A trained model + code repo | A manuscript draft with interpretability tables and effect sizes |
| Primary mode | Agentic — give it a goal, it iterates | Skill battery — you compose modular sub-skills for diagnostics, baselines, full ML+DL battery, and assembly |
| Optimized for | New training runs, fine-tuning, benchmark reproduction, model-shipping velocity inside the HF ecosystem | Applying existing methods to a life-science research question and generating publication-ready insights |
| Strongest when | You know the engineering spec (“fine-tune model X on dataset Y to beat baseline Z”) | You know the scientific question and need a structured workflow with the rigor norms reviewers in your field expect |
| What it builds in | HF docs, datasets, papers, jobs, sandbox; auto-compaction; agent telemetry to HF Hub | Per-data-type baselines, ML+DL model batteries, GradCAM / TabNet attention / permutation importance, manuscript-section drafting, LaTeX assembly, external-review prep |
In one sentence: ml-intern is for building new ML artifacts; the Vera plugin is for applying methods to questions in life science to produce defensible insights.
The two are compatible — for example, you might use ml-intern to train a domain-specific embedding model on your in-house data, then use the Vera plugin’s vera-ai-application-pipeline to deploy that model in a downstream classification study and write up the paper.
What to watch for
- No LICENSE file in the repo at the time of writing (2026-05-05). For most users that’s fine; for anyone in regulated pharma R&D where legal asks about license terms before adoption, it’s worth flagging until upstream adds one.
- You bring (and pay for) the LLM. ml-intern itself is free, but its agent loop calls Anthropic, OpenAI, or whichever model you point it at. Long sessions on a frontier model add up. Plan accordingly.
- HF telemetry is on by default. Sessions auto-upload to a private HF dataset under your account. Easy to opt out (
{ "share_traces": false }in the CLI config), but worth knowing on day one — especially if any of your prompts include unpublished data or ideas. - Notification gateways are one-way. Slack integration is for status pings (approval-required / error / turn-complete), not chat. Don’t expect to drive the agent from Slack.
Verdict
The closest open-source thing to “Claude Code, but specialized for the Hugging Face stack.” Best fit when your role is producing trained models and you want the iteration loop on training infrastructure compressed. If your job is producing insights and manuscripts from ML applied to bio/pharma data, the AI Research Pipeline plugin is the closer match — and the two compose cleanly.