While Binary Search Trees (BST) and AVL trees are great for in-memory operations, they perform poorly when data is stored on disk. This is where B-Trees and B+ Trees shine. They are the underlying data structures for almost every major database (MySQL, PostgreSQL, Oracle) and file system (NTFS, EXT4).
The primary goal of a B-Tree is to minimize disk I/O operations by keeping the tree short and wide.
1. What is a B-Tree?
A B-Tree is a self-balancing search tree in which nodes can have more than two children.
- Each node can contain multiple keys (up to $M-1$, where $M$ is the order of the tree).
- Keys are stored in sorted order.
- All leaf nodes are at the same depth.
- It is designed to read and write large blocks of data (pages).
2. What is a B+ Tree?
A B+ Tree is a variation of the B-Tree with two significant changes:
- Data only in leaves: Internal nodes only store keys (acting as pointers). Actual data (or pointers to data) is only stored in leaf nodes.
- Linked Leaves: All leaf nodes are linked together in a doubly linked list.
Why is B+ Tree preferred for databases?
- Range Queries: Because leaves are linked, you can perform a range scan (e.g.,
WHERE age BETWEEN 20 AND 30) by finding the first leaf and then following the links. - Cache Efficiency: Since internal nodes don't store data, more keys fit into a single block of memory, reducing the height of the tree even further.
3. Implementation Logic (High Level)
Implementing a full B+ Tree in an interview is rare due to its complexity, but you should understand the core operations:
- Search: Similar to BST but with multiple keys per node.
- Insert: If a node exceeds capacity, split it and move the middle key up to the parent.
- Delete: If a node falls below minimum occupancy, merge it with a sibling or redistribute keys.
// Simplified structure of a B+ Tree Node
class BPlusTreeNode {
boolean isLeaf;
int[] keys;
BPlusTreeNode[] children; // For internal nodes
BPlusTreeNode next; // For leaf nodes (linked list)
Object[] data; // For leaf nodes (actual records)
}
B-Tree vs. B+ Tree
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data Storage | Every node can store data | Only leaf nodes store data |
| Search Performance | Varies (Can find in internal node) | Consistent (Always reach leaf) |
| Range Queries | Slow (Requires tree traversal) | Fast (Sequential leaf scan) |
| Space Overhead | Less (Internal nodes store data) | More (Keys repeated in leaves) |
Summary
B-Trees and B+ Trees are a masterclass in optimizing for hardware constraints. By increasing the "fan-out" (number of children), they ensure that even with millions of records, the goal is always only 3 or 4 disk seeks away. Understanding these structures is essential for any developer working on high-scale backend systems or database internals.
