Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

https://news.ycombinator.com/rss Hits: 10
Summary

TL;DR: Coding agents generate better optimizations when they read papers and study competing projects before touching code. We added a literature search phase to the autoresearch / pi-autoresearch loop, pointed it at llama.cpp with 4 cloud VMs, and in ~3 hours it produced 5 optimizations that made flash attention text generation +15% faster on x86 and +5% faster on ARM (TinyLlama 1.1B). The full setup works with any project that has a benchmark and test suite.Key takeaways:Agents that read papers and study competing projects before writing code find optimizations that code-only agents miss. The literature research pointed the agent at operator fusions present in CUDA/Metal backends but absent from CPU.5 of 30+ experiments landed: 4 kernel fusions and an adaptive parallelization. The biggest win fused three passes over flash attention’s QK tile into a single AVX2 FMA loop.Studying forks and other backends was more productive than searching arxiv. ik_llama.cpp and the CUDA backend directly informed two of the five final optimizations.Total cost: ~$29 ($20 in CPU VMs, $9 in API calls) over ~3 hours with 4 VMs.Where code-only context works#Karpathy’s autoresearch showed that a coding agent can autonomously improve a neural network training script. In our previous post, we scaled that to 16 GPUs and watched the agent run ~910 experiments in 8 hours, driving val_bpb down 2.87%. The agent brainstormed ideas from code context alone, and the experiments were all variations on the same train.py.Since then, pi-autoresearch generalized the loop into a reusable extension for any benchmarkable target. Shopify CEO Tobi Lütke ran it on Liquid, the Ruby template engine that processes $292B in annual merchandise volume. The agent ran ~120 experiments, producing 93 commits that cut parse+render time by 53% and allocations by 61% with zero regressions across 974 unit tests (Simon Willison’s writeup, Tobi’s post).In that case, the optimization surface was visible in the source. The Liqu...

First seen: 2026-04-09 17:42

Last seen: 2026-04-10 02:47