How Databases Search a Billion Rows
Every database you use is built on B-trees. See how they minimize disk I/O to search billions of rows efficiently.
Neural Download
Installing mental model for B trees.
Your database has a billion rows. You run a query. The answer comes back in milliseconds.
How?
Let's start with what doesn't work. Imagine your data sorted in a list. Binary search finds anything in about thirty comparisons. Sounds fast — until you realize where the data actually lives.
It lives on disk. Not in RAM. Disk.
And here's the thing about disk: every time you read, you don't get one record. You get a page — a fixed-size block, usually four to sixteen kilobytes. And each read has a cost — on a spinning disk, that's around ten milliseconds. Even on an SSD, the round trips add up.
So binary search on disk means up to thirty page reads. On a spinning disk, that's three hundred milliseconds — for one query.
Now imagine a naive binary search tree. Each node holds a single key. Each level, one page read. A billion records means thirty levels. Thirty page reads.
The bottleneck isn't comparisons. Your CPU can do millions of those per second. The bottleneck is disk I/O. Every jump to a new node might wait for the disk head to physically move.
What if each node could hold not one key, but hundreds?
This is a B tree.
Instead of one key per node, each node holds many keys — kept in sorted order — with child pointers between them. A node might hold a hundred, two hundred, even five hundred keys.
And here's the design trick that makes everything click: each node is sized to fill exactly one disk page.
One disk read. Hundreds of keys. All loaded into memory at once.
Search works like this. Start at the root. You read one page — now you have hundreds of keys in RAM. Binary search within that page is essentially free. Find which range your target falls in, follow the child pointer, and read the next page.
At each level, one disk read gives you hundreds of choices. Not one choice, like a binary tree. Hundreds.
So how many levels do you need?
With five hundred keys per node, two levels index two hundred fifty thousand records. Three levels — one hundred twenty-five million. Four levels — over sixty billion.
A billion rows? Three to four disk reads. That's it.
This is why PostgreSQL, MySQL, SQLite, and most relational databases use B trees as their default index structure. Not because B trees are clever. Because they're engineered to match how storage hardware actually works.
One node equals one disk page. One disk read equals one level of the tree. Maximum information per I/O operation.
So searching is fast. But what happens when you insert?
Keys slot into their sorted position within a node. Simple enough — until the node is full.
Watch what happens.
The next insert pushes the node past capacity. It can't just grow — pages have a fixed size. So the node splits. It breaks into two halves, and the median key — the one right in the middle — gets pushed up to the parent.
Now the parent has one more key and one more child pointer. And if the parent was already full? It splits too. The cascade continues upward — split after split — until it reaches a node with room, or the root itself splits.
When the root splits, something special happens. A new root is created above it. The tree just grew taller — but notice: it grew from the top, not the bottom.
Every single leaf stays at the same depth. Always.
This is the guarantee that makes B trees special. They are always perfectly balanced. Not approximately. Not eventually. Mathematically guaranteed, on every insert, every delete. The path from root to any leaf is always the same length.
And that means the number of disk reads for any lookup is bounded by the height of the tree. Predictable. Small. Consistent.
Let's zoom out and see what we've built.
Three levels. A root, some internal nodes, and leaves at the bottom. This modest-looking tree indexes — one billion records.
But it gets better.
That root node? Your database keeps it cached in RAM. The second level too, usually. They're accessed so frequently they almost always stay in the page cache.
So in practice, finding any row among a billion takes two — maybe three — actual disk reads. Not thirty. Two.
This is why your SELECT WHERE ID equals forty-two comes back instantly. The database isn't scanning. It isn't sorting. It's walking down a tree that was designed, from the ground up, to minimize the one thing that actually matters: how many times you touch the disk.
And in most databases, this is actually a B-plus tree — a variant where all the data lives in the leaves, and the leaves are linked together. Find the start of your range, then walk sideways through the leaf chain. No jumping back to the root. No extra seeks.
B trees aren't just a data structure. They're the bridge between your data and the reality of storage. And now when you write a query, you know exactly what's happening underneath.
Cognitive architecture... updated.
