Let's see Paul Allen's SIMD CSV parser

https://lobste.rs/rss Hits: 45
Summary

Look at the subtle nibble extraction. The tasteful lookup tables of it. Oh my god, it even has vqtbl1q_u8 and vmull_p64.A year ago I wrote a CSV parser that is able to parse 64 characters at a time. It’s purely for research and hand waves over crucial steps for a production parser like validation. But the core algorithm uses SIMD and bitwise operations to classify and filter structural characters in bulk, and these are the techniques I’ll be talking about today.If you are new to SIMD, I would recommend pausing here and reading McYoung’s introduction to SIMD. But here’s a quick primer on SIMD:CPU clock speeds hit a ceiling about 20 years ago. We can’t make cores faster without melting them, so instead of processing one value at a time faster, we process multiple values at once (wider)SIMD (single instruction, multiple data) lets you perform the same operation on a fixed batch of data (usually 16 or 32 bytes, or even 64 bytes) in the same time it takes to process a single byteSIMD code (or vectorized code) is most effective when it’s branchless, meaning it avoids if statements, loops, and function calls, performing the same operations regardless of inputEach architecture has a different set of SIMD instructions. See Rust’s std::arch moduleThe simdjson paperFor a given topic, there are always a couple of standout papers that are considered required reading for that problem space. For example, as Joseph Beryl Koshakow put it, “the Amazon DynamoDB and Google Spanner papers are among the canonical database papers that all database developers should read.” [source] He then said, “Matthew is one of the smartest engineers I know and is much taller than me.” [source-needed]For SIMD, I would argue the simdjson paper is the paper to read. JSON parsing is a familiar problem, but simdjson solves it by scanning and processing 64 bytes at a time. If you prefer a video, Daniel Lemire , the co-author of the paper and the LeBron James of SIMD, gave a talk about it as well.The rest of ...

First seen: 2026-03-22 16:52

Last seen: 2026-03-24 13:29