This blog is about how data is laid out on disk, specifically about the details of Sorted String Tables (SSTs). Let’s cut to the chase.First, it’s important to understand how data gets from disk in to usable memory. Most cloud instances use SSDs, so we’ll focus on that instead of spinning disks.If you haven’t read the initial frameworks blog post of this series I recommend starting with that. In this context, the main point to takeaway is that not all read amplification is created equal. All databases need to abide by the laws of physics: fetching data that’s already in memory is hundreds of times faster than fetching data from over the network. Data structure design is all about minimizing the amount of data you need to fetch from expensive storage tiers while reducing the memory overhead (this an application of the RUM conjecture).A high performance data system needs to reduce the amount of unnecessary bytes read to serve a query, reading only the necessary (”hot”) data.An ideal system would read exactly the bytes it needs for the data it wants, but the fundamental unit of I/O isn’t a byte. On SSDs, it’s a page (typically 4KB). This means that whether you request a single byte, a hundreds bytes, or four thousand bytes from your disk you’ll still get the 4KB.You can verify this on your own machine:# on MacOS, check the size of the page on your disk > stat -f %k / 4096To see this in action, I ran an experiment measuring read latency for 1KB vs 4KB reads using Direct I/O (bypassing the OS page cache). Despite requesting 4x less data, the 1KB reads took about the same time as the 4KB reads (about 9.2µs) on my SSD:Here’s where it gets interesting for database design. Imagine you’re serving a query that needs a single 256B row that lives somewhere in a 4KB page alongside other rows:To read row3, you have to read the entire page. Rows 1 through 10 come along for the ride whether you want them or not. The ratio of “data you read” to “data you needed” is an instance of rea...
First seen: 2026-01-09 13:51
Last seen: 2026-01-09 20:52