Fast regex search: indexing text for agent tools

https://news.ycombinator.com/rss Hits: 8
Summary

Time is a flat circle. When the first version of grep was released in 1973, it was a basic utility for matching regular expressions over text files in a filesystem. Over the years, as developer tools became more advanced, it was gradually superseded by more specialized tools. First, by roughly syntactic indexes such as ctags. Later on, many developers moved to specialized IDEs for specific programming languages that allowed them to navigate codebases very efficiently by parsing and building syntactical indexes, often augmented with type-level information. Eventually this was standardized in the Language Server Protocol (LSP), which brought these indexes to all text editors, new and old. Then, just when LSP was becoming a standard, Agentic coding arrived, and what do you know: the agents just love to use grep. There are other state-of-the art techniques to gather context for Agents. We've talked in the past about how much you can improve Agent performance by using semantic indexes for many tasks, but there are specific queries which the model can only resolve by searching with regular expressions. This means going back to 1973, even though the field has advanced a little bit since then. Most Agent harnesses, including ours, default to using ripgrep when providing a search tool. It's a standalone executable developed by Andrew Gallant which provides an alternative to the classic grep but with more sensible defaults (e.g. when it comes to ignoring files), and with much better performance. ripgrep is notoriously fast because Andrew has spent a lot of time thinking about speed when matching regular expressions. No matter how fast ripgrep can match on the contents of a file, it has one serious limitation: it needs to match on the contents of all files. This is fine when working in a small project, but many of Cursor's users, particularly large Enterprise customers, work out of very large monorepos. Painstakingly large. We routinely see rg invocations that take more than 1...

First seen: 2026-03-26 21:16

Last seen: 2026-03-27 16:29