Why BM25 queries with more terms can be faster (and other scaling surprises)

https://news.ycombinator.com/rss Hits: 11
Summary

Why BM25 queries with more terms can be faster (and other scaling surprises)January 07, 2026β€’Adrien Grand (Engineer)BM25 full-text search has very different scaling characteristics than vector search. Vector search latency is generally a function of vector dimensions, top-k, the size of the dataset, and the presence of filters. BM25 latency, on the other hand, also varies a lot by query, and in some surprising ways: Sometimes adding a new term to a query actually makes it faster The fastest query at top_k=10 may not be fastest at top_k=10000 This post discusses what I learned modeling BM25 query latencies across varying term counts, document counts, and top_k values. Background turbopuffer implements BM25 full-text search by indexing text data into an inverted index, a data structure that maps unique terms to the list of document IDs that contain these terms, which are called β€œpostings”. term posting list β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”¬β”€β”€β”€β” β”‚ pufferfish β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚ 1 β”‚ 2 β”‚ // "pufferfish" appears in few documents β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”΄β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β” β”‚ fish β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚ 1 β”‚ 2 β”‚ 4 β”‚ 6 β”‚ 9 β”‚ // appears in more documents β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€ β”‚ to β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚ 1 β”‚ 2 β”‚ 3 β”‚ 4 β”‚ 5 β”‚ 6 β”‚ 7 β”‚ 8 β”‚ 9 β”‚ β€’β€’β€’ // apears in many β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ └───┴───┴───┴───┴───┴───┴───┴───┴───┴─ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€ β”‚ be β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚ 1 β”‚ 2 β”‚ 3 β”‚ 4 β”‚ 5 β”‚ 6 β”‚ 7 β”‚ 8 β”‚ 9 β”‚ β€’β€’β€’ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ └───┴───┴───┴───┴───┴───┴───┴───┴───┴─ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€ β”‚ or β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Άβ”‚ 1 β”‚ 2 β”‚ 3 β”‚ 4 β”‚ 5 β”‚ 6 β”‚ 7 β”‚ 8 β”‚ 9 β”‚ β€’β€’β€’ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ └───┴───┴───┴───┴───┴───┴───┴───┴───┴─ term posting list β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”¬β”€β”€β”€β” β”‚ pufferfish β”œβ”€β–Άβ”‚ 1 β”‚ 2 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”΄β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β” β”‚ fish β”œβ”€β”€β”€β–Άβ”‚ 1 β”‚ 2 β”‚ 6 β”‚ 9 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€ β”‚ to β”œβ”€β”€β”€β”€β”€β–Άβ”‚ 1 β”‚ 2 β”‚ 3 β”‚ 4 β”‚ 5 β”‚β€’β€’β€’ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ └───┴───┴───┴───┴─...

First seen: 2026-01-13 04:03

Last seen: 2026-01-13 14:05