System Design

System Design: Designing a Distributed ID Generator (Snowflake)

How to generate billions of unique, time-ordered IDs without a central master. A technical deep dive into Twitter Snowflake, Epochs, and Clock Drift.

Sachin Sarawgi·April 20, 2026·2 min read
#system-design#snowflake#distributed-id#id-generation#scalability#distributed-systems

Designing a Distributed ID Generator

Prerequisite: To understand why distributed IDs are hard, first read about Database Sharding and Partitioning.

In a distributed system, you often need to generate unique identifiers (IDs) for every record (e.g., Tweets, Orders, Photos). A standard database auto-increment won't scale because it requires a central "master" that becomes a bottleneck. We need a Distributed ID Generator that is unique, time-ordered, and highly available.

1. The Problem with UUIDs

UUIDs (e.g., 123e4567-e89b-12d3-a456-426614174000) are 128-bit strings.

  • Problem A: They are long (128 bits vs 64 bits), increasing index size and memory footprint.
  • Problem B: They are not naturally time-ordered. Random inserts into a B-Tree index cause massive performance degradation (page splits).

2. The Solution: Twitter Snowflake

Snowflake generates a 64-bit integer that is roughly time-ordered.

Bit Structure (64 bits):

  • 1 bit: Unused (sign bit).
  • 41 bits: Timestamp (ms) since a custom epoch. Provides ~69 years of unique IDs.
  • 10 bits: Worker ID (Machine ID). Identifies the specific machine/server (allows for 1,024 unique workers).
  • 12 bits: Sequence number. A counter for IDs generated in the same millisecond on the same worker (4,096 IDs per ms per worker).

![Snowflake Bit Structure Diagram Placeholder]

3. Why it Scales

  • Decentralized: Each worker generates IDs independently. No central coordinator required.
  • High Throughput: A single worker can generate over 4 million IDs per second.
  • Time-ordered: Sorting by ID is equivalent to sorting by time, which optimizes database index performance.

4. The Critical Challenge: Clock Drift

What if one server's clock is ahead of others, or an NTP sync moves the clock backward?

  • The Problem: If the clock moves backward, the generator could produce a duplicate ID that was already used in the "future."
  • The Solution: The implementation must check if current_timestamp < last_timestamp. If so, it should wait for the clock to catch up or throw an exception.

Summary

The Snowflake algorithm is a masterclass in bit manipulation and distributed autonomy. By embedding time and location into a 64-bit integer, you build a system that generates millions of unique, index-friendly IDs per second with zero coordination.


Next: Distributed Locking with Zookeeper Previous: Kafka Consumer Groups Explained

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon
Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon
Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Found this useful? Share it: