Designing a real-time microblogging platform like Twitter (now X) is a flagship system design problem. The system must support massive read throughput, real-time message broadcasting, and rapid feed assembly under the constraint of extreme write spikes (e.g., during live sports events).
The core technical challenge is not storing millions of tweets, but delivering them to hundreds of millions of active users' home timelines with sub-second latency.
1. Core Requirements & Scale Constraints
Functional Requirements
- Post a Tweet: Users can publish new text tweets (max 280 characters) with optional images or videos.
- Home Timeline: Users can view a chronological feed of tweets from people they follow (read path).
- User Timeline: Users can view their own historical tweets.
- Follow/Unfollow: Users can follow other users, immediately updating their feed sources.
Non-Functional Constraints (SLAs)
- High Availability: The read path must achieve $99.999%$ (Five Nines) availability.
- Ultra-Low Latency: Home timeline generation must render in $\le 200\text{ms}$ at the $p99$ percentile.
- Eventual Consistency: Tweet delivery to followers can lag up to 5 seconds under normal load.
- Massive Scalability:
- Daily Active Users (DAU): 500 Million.
- Writes (Tweet publishing): Average 5,000 tweets per second (TPS), peak 25,000 TPS.
- Reads (Timeline fetches): Average 500,000 queries per second (QPS).
Back-of-the-Envelope Estimates
- Storage Capacity (Tweets):
- 500M tweets/day $\times$ 300 bytes (text + metadata) = 150 GB of text storage per day.
- Over 5 years: $150\text{ GB/day} \times 365 \times 5 \approx 273.7\text{ TB}$.
- Media Storage Capacity:
- Assume 10% of tweets contain images (1 MB avg) and 1% contain videos (10 MB avg).
- Images: $50\text{M} \times 1\text{ MB} = 50\text{ TB/day}$.
- Videos: $5\text{M} \times 10\text{ MB} = 50\text{ TB/day}$.
- Total Media: 100 TB/day. Over 5 years (excluding deduplication): $\approx 182.5\text{ PB}$.
- Network Bandwidth:
- Ingress (Writes): $100\text{ TB/day} \div 86400\text{ seconds} \approx 1.15\text{ GB/s}$ (9.2 Gbps).
- Egress (Reads): Assuming each user views 10 timelines per day (each feed fetching 20 tweets with media): $\approx 115\text{ GB/s}$ (920 Gbps) read traffic.
2. API Design & Core Contracts
We expose standard REST API endpoints for key operations. All JSON payloads are serialized over HTTPS.
Post a Tweet
POST /v1/tweets
Authorization: Bearer <JWT_TOKEN>
Content-Type: application/json
{
"text": "Architecting a hybrid push/pull timeline at scale! #systemdesign #scalability",
"media_ids": ["img_987213987", "vid_872364812"]
}
Response:
{
"status": "success",
"data": {
"tweet_id": "1423894712093847552",
"user_id": "89374923",
"text": "Architecting a hybrid push/pull timeline at scale! #systemdesign #scalability",
"media_urls": [
"https://media.codesprintpro.com/img_987213987.png"
],
"created_at": "2026-05-22T10:45:00Z"
}
}
Fetch Home Timeline
GET /v1/timeline/home?limit=20&cursor=1423894712093847552
Authorization: Bearer <JWT_TOKEN>
Response:
{
"data": {
"tweets": [
{
"tweet_id": "1423894723908234123",
"author": {
"user_id": "1009283",
"name": "Staff Engineer"
},
"text": "Consistency always yields to availability in high-volume social feeds.",
"created_at": "2026-05-22T10:46:12Z"
}
],
"next_cursor": "1423894709823749812"
}
}
3. High-Level Design (HLD)
To handle the 100:1 read-to-write ratio, the system precomputes home timelines for active users. The overall architecture relies on a Hybrid Push/Pull model to handle celebrity accounts.
flowchart TD
Client[Mobile/Web Client] -->|HTTPS| APIGateway[API Gateway / Envoy]
%% Write Path (Push)
APIGateway -->|Write Tweet| TweetService[Tweet Ingestion Service]
TweetService -->|Write Log| Kafka[Kafka Event Bus]
Kafka -->|Consume Event| FanoutWorkers[Fanout Workers]
FanoutWorkers -->|Lookup Followers| FollowService[Social Graph Service]
FanoutWorkers -->|Push Tweet ID| RedisCluster[(Redis Timeline Cache)]
%% Storage layer
TweetService -->|Save Metadata| DocDB[(ScyllaDB Tweet Store)]
TweetService -->|Upload Assets| S3[(Amazon S3 Media Storage)]
%% Read Path (Timeline Assembly)
APIGateway -->|Read Feed| TimelineService[Timeline Service]
TimelineService -->|Get Normal Feed| RedisCluster
TimelineService -->|Check Celebrities| FollowService
TimelineService -->|Merge Active Feeds| MergeEngine[Timeline Merger Engine]
MergeEngine -->|Fetch Details| DocDB
MergeEngine --> Client
4. Low-Level Design (LLD) & Data Models
Database Choice Rationale
- Social Graph (Follows): We use a highly optimized relational database (e.g., PostgreSQL sharded by
follower_id) or a graph database (e.g., Neo4j/Amazon Neptune). Due to the strict primary-key structure, sharded MySQL/Postgres is preferred for high-throughput node relationships. - Tweet Store: A wide-column store like ScyllaDB or Cassandra is perfect. It supports ultra-high write speeds, horizontal sharding, and structured query retrieval by clustering key.
- Timeline Cache: Redis Cluster holds active user feeds in-memory using Sorted Sets (
ZSET), where the key is theuser_id, the member is thetweet_id, and the score is thetweet_id(Snowflake ID encodes timestamp).
SQL Table DDL declarations
Sharded Social Graph Schema (MySQL/PostgreSQL)
CREATE TABLE follows (
follower_id BIGINT NOT NULL,
followee_id BIGINT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (follower_id, followee_id),
KEY idx_followee (followee_id)
) ENGINE=InnoDB;
Sharding Strategy: Shard the
followstable byfollower_idusing a consistent hashing algorithm. This ensures all followee relationships for a single user reside in the same physical database node, allowing immediate retrieval of followee lists.
ScyllaDB Tweet Store Schema
CREATE KEYSPACE tweet_keyspace WITH replication = {
'class': 'NetworkTopologyStrategy',
'us-east': 3,
'us-west': 3
};
CREATE TABLE tweet_keyspace.tweets (
user_id bigint,
tweet_id bigint,
content varchar,
media_urls list<varchar>,
created_at timestamp,
PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC);
Partitioning Strategy: Partitioned by
user_idto ensure all tweets from a single author are stored together. The clustering keytweet_idallows immediate time-ordered range scans for a user's profile feed.
5. Scaling Challenges & System Bottlenecks
The Fan-out Celebrity Problem (Hot Spots)
If a user with 80 Million followers (like Elon Musk or Cristiano Ronaldo) tweets, a pure Fan-out on Write model forces the Fanout Workers to write that tweet ID to 80 million separate Redis timeline caches. This causes:
- Massive Redis CPU spikes.
- Replication queue backpressure.
- Out-of-memory crashes due to write amplification.
To mitigate this, we classify users into two tiers:
- Standard Users (Follower count < 25,000): Use Fan-out on Write (Push). When they tweet, their tweet ID is pushed to all their followers' Redis caches immediately.
- Celebrity Users (Follower count >= 25,000): Use Fan-out on Read (Pull). When they tweet, the write path completely bypasses fanout. The tweet is only committed to the ScyllaDB Tweet Store.
When a reader fetches their feed, the Timeline Merger Engine performs a hybrid merge:
- Fetch the user's precomputed home timeline from their Redis
ZSET. - Interrogate the Social Graph Service to find what celebrities the user follows.
- Query the ScyllaDB Tweet Store for the latest tweets of those followed celebrities.
- Merge the celebrity tweets with the precomputed normal timeline in-memory based on Tweet ID, sorting them chronologically.
sequenceDiagram
autonumber
actor User as Active User
participant TS as Timeline Service
participant Cache as Redis Timeline Cache
participant SG as Social Graph DB
participant DB as ScyllaDB Tweet Store
User->>TS: GET /v1/timeline/home
TS->>Cache: Fetch precomputed timeline (ZSET)
Cache-->>TS: Return normal users' tweet IDs
TS->>SG: Get followed celebrities list
SG-->>TS: Return [Celebrity_A, Celebrity_B]
TS->>DB: Fetch recent tweets of [Celebrity_A, Celebrity_B]
DB-->>TS: Return celebrity tweets
TS->>TS: Merge & sort chronologically (in-memory)
TS-->>User: Return complete unified feed
6. Operational Trade-offs & CAP Theorem Realities
Consistency vs. Latency in Social Media Feeds
In designing Twitter's timeline feed, we choose an Availability-Consistent (AP) model over strict consistency (CP). Having every follower see a tweet at the exact same millisecond is completely unnecessary.
- Push Path Latency: Pushing a tweet to 100,000 followers asynchronously through Kafka takes about 500ms to 2 seconds under normal conditions. This represents eventual consistency.
- In-Memory Cache Eviction: Caching home timelines for all 500M DAU in Redis requires immense memory resources. To balance cost and performance, we apply an Active User Cache Eviction Strategy:
- We only store precomputed timelines in Redis for users who have logged in within the last 15 days.
- For inactive users (who log back in after 15 days), their timeline cache is empty. The system intercepts this "cache miss," reads the follow graph from PostgreSQL, queries ScyllaDB for the latest tweets from all followed users, and performs a complete on-the-fly rebuild of the Redis
ZSETfeed.
Replication Lag Challenges
When a user unfollows another user, we must immediately invalidate the unfollowed user's tweets from the follower's timeline. In a pure Push system, this requires an asynchronous worker to iterate through the follower's Redis ZSET and purge all tweet IDs belonging to the unfollowed user.
Under severe network partitions, if the Graph Database primary node processes the unfollow, but the replica utilized by the asynchronous fanout worker lags by several seconds, the system will read stale states and attempt to re-push tweets, leading to a "ghost tweet" glitch where unfollowed content continues to show up. To mitigate this, we stamp each Redis timeline cache with a local follow_epoch_version token. If the cache epoch mismatch is detected, the home timeline is dynamically invalidated and fully rebuilt from the sharded databases on-the-fly.
7. Failure Scenarios & High-Availability Resilience
A. Redis Cache Node Crash (Thundering Herd Prevention)
If a major Redis cache shard holding 5 million precomputed timelines crashes, all timeline read requests for those 5 million users would fall back to sharded database queries. This sudden influx of high-volume disk-join queries would instantly lock up database connection pools, taking down the entire site.
Mitigation:
- Fallback Standby Replicas: Every Redis master node is backed by an active-passive hot standby replica. If the master fails, Sentinel coordinates a failover in < 5 seconds.
- Dynamic Query Rate-Limiting & Jitter: When a cache miss occurs, the system utilizes a Distributed Lock with TTL (using Redis Redlock) so that only one background thread is allowed to query ScyllaDB to rebuild the timeline cache, while other concurrent requests serve a stale cached version or a cached subset with a random backoff jitter.
B. Kafka Fanout Worker Queue Backpressure
During a massive viral event (e.g. World Cup Final), write spikes scale to 30,000 tweets per second. If the Fanout Workers get saturated, the Kafka processing queue grows exponentially. Users will post tweets, but their followers won't see them on their timeline for minutes or hours.
Mitigation:
- Auto-Scaling Consumer Groups: Set up Kubernetes HPA (Horizontal Pod Autoscaler) monitoring the Kafka
Consumer Lagmetric. If consumer lag exceeds 50,000 messages, spin up additional Fanout Worker containers dynamically. - Shedding Non-Critical Work: If lag continues to grow, degrade the fanout quality by bypassing normal push completely for all users with > 5,000 followers (temporarily lowering the celebrity threshold from 25,000), switching them to the Pull path to preserve queue throughput.
flowchart TD
QueueLag{Kafka Consumer Lag > Threshold?}
QueueLag -->|Yes| ScaleWorkers[Autoscale Fanout Workers]
QueueLag -->|Critical Lag| LowerThreshold[Lower Celebrity Threshold to 5K]
ScaleWorkers --> Rebalance[Consumer Group Rebalances Partitions]
LowerThreshold --> PushToPull[Switch Medium Users to Pull Model]
PushToPull --> ClearQueue[Reduce Fanout Write Amplification]
8. Candidate Verbal Script (Interview Guide)
Interviewer: "We have a working hybrid push/pull timeline feed. However, what happens during huge viral events—like the World Cup final—where millions of users post tweets concurrently, and read QPS scales to 3 million requests per second? How do you prevent cache starvation?"
Candidate: "To handle a massive concurrent write and read spike of this magnitude, we must apply layered caching, rate-limiting, and partition-isolation techniques.
First, we must prevent Cache Stampedes. If our Redis caches expire or get evicted under memory pressure during the peak of the event, the timeline service will attempt to fetch tweets directly from ScyllaDB. This would cause immediate database connection pool starvation. I would implement a probabilistic early expiration algorithm (XFetch) on our cached feeds. Instead of a hard TTL, when a cached timeline is read near its expiration, the background thread fires an asynchronous query to refresh the cache before it dies, ensuring a $100%$ cache hit rate.
Second, we must isolate celebrity hot-spot partitions. When a celebrity is tweeting multiple times per minute during a viral event, we can deploy a Local In-Memory Cache (e.g., Caffeine Cache) directly inside the container memory of our Timeline Services. This shields ScyllaDB and Redis from millions of concurrent queries for that single celebrity’s tweet payload.
Finally, we apply ingress rate-limiting and backpressure at the API Gateway. We configure token bucket limits based on tier levels, dropping non-essential background payloads (like profile updates or read-receipt logs) so that the core feed delivery path maintains $100%$ processing priority."