System Design: Designing Twitter (Timeline and News Feed)
Twitter (now X) is a massive real-time messaging system. The core technical challenge is not storing the tweets, but delivering them to millions of followers' timelines with sub-second latency.
1. Core Requirements
- Tweet Publishing: A user can post a new tweet.
- Timeline (Feed): A user can see tweets from people they follow.
- High Availability: The system must be always available.
- Scalability: Handling millions of users and high-profile "Celebrity" accounts.
2. The Fan-out Challenge
"Fan-out" is the process of delivering a single tweet to all the followers of the author.
Option A: Fan-out on Read (The Pull Model)
When a user opens their timeline, the system searches for all people they follow, fetches their latest tweets, and sorts them by time.
- Pros: Fast writes.
- Cons: Slow reads. Doing a join across thousands of authors for every timeline refresh is extremely expensive for the database.
Option B: Fan-out on Write (The Push Model)
When a user posts a tweet, the system immediately pushes a reference to that tweet into the "Timeline Cache" (usually in Redis) of every follower.
- Pros: Blazing fast reads. The user's timeline is already pre-computed in Redis.
- Cons: Slow writes. If a celebrity with 50 million followers tweets, the system must perform 50 million Redis writes immediately.
3. The Hybrid Solution: Handling Celebrities
Twitter uses a hybrid approach to balance these trade-offs:
- Regular Users: Use Fan-out on Write (Push). Their tweets are pushed to their followers' caches immediately.
- Celebrities (High Follower Count): Use Fan-out on Read (Pull). Their tweets are NOT pushed to millions of caches. Instead, when a follower of a celebrity views their timeline, the celebrity's tweets are merged into the timeline on-the-fly.
4. Storage & Caching
- Tweet Store: Cassandra or a similar wide-column store is ideal for storing tweets indexed by
user_idandtimestamp. - Timeline Cache: Redis stores the list of tweet IDs for each user's feed.
- Media Store: Amazon S3 for images and videos, served via a CDN (CloudFront).
5. Search and Trends
- Search: Use Elasticsearch or a custom inverted index to handle hashtag and keyword searches.
- Trends: Use Apache Storm or Flink for real-time stream processing to identify "Trending" topics based on tweet frequency.
Summary
The secret to Twitter's scale is Pre-computation. By pre-calculating timelines for 99% of users and only using the "Pull" model for high-follower accounts, Twitter maintains the responsiveness that makes it a real-time platform.
