Hybrid Logical Clocks (HLC): Mastering Time
In a distributed system, time is a lie. Due to Clock Drift, no two servers have perfectly synchronized clocks. If Server A records an event at 10:00:01 and Server B records a subsequent event at 10:00:00, your system has "violated causality."
To solve this, we use Hybrid Logical Clocks (HLC).
1. Why Physical Clocks Fail
Network Time Protocol (NTP) can synchronize clocks within a few milliseconds, but in a system processing 100,000 requests per second, a few milliseconds is an eternity.
- Clock Smearing: NTP slowly adjusts the clock.
- Clock Jumps: NTP abruptly resets the clock, potentially moving it backward.
2. The HLC Structure
An HLC timestamp consists of two parts:
- Physical Component (pt): The wall-clock time from the OS.
- Logical Component (l): A counter that increments when physical time stands still or moves backward.
3. The HLC Algorithm
When a node receives or generates a new event:
- Rule 1:
new_physical = max(current_physical, last_physical, incoming_physical). - Rule 2: If
new_physicalis greater thanlast_physical, reset the logical counter to0. - Rule 3: If
new_physicalequalslast_physical, increment the logical counter.
4. Why this matters: Causal Ordering
HLCs provide Causal Consistency. If Event A caused Event B, HLC guarantees that HLC(A) < HLC(B). This is essential for:
- CockroachDB: Uses HLC for its multi-version concurrency control (MVCC).
- Conflict Resolution: Deciding which update happened "first" in a master-less cluster (like Cassandra).
Summary
Hybrid Logical Clocks are the bridge between the physical reality of hardware and the logical requirements of distributed software. By combining the best of both worlds, HLC allows us to maintain the arrow of time across thousands of independent nodes.
