System Design: Designing a URL Shortener (TinyURL)
Designing a URL shortener like TinyURL or Bitly seems simple on the surface, but it's a fantastic exercise in thinking about data encoding, unique ID generation, and high-read caching.
1. Core Requirements
- Shorten: Convert a long URL into a short one.
- Redirect: Redirect users from the short URL to the original long URL.
- Custom Alias: Allow users to choose their own short link.
- Analytics: Tracking click counts for shortened links.
2. Key Design Decisions
A short URL typically looks like tinyurl.com/aB12cd. The string after the slash is the unique identifier.
How to generate the short string?
- Option A: Hashing (MD5/SHA-256): Taking the first 7 characters of a hash.
- Problem: Hash collisions are likely, requiring additional logic to resolve.
- Option B: Base62 Encoding: Converting a decimal number (a unique ID) into a Base62 string (a-z, A-Z, 0-9).
- Benefit: 6 characters provide 2^6 \approx 56.8$ Billion unique combinations. This is the industry standard.
3. Unique ID Generation at Scale
To use Base62 encoding, you first need a unique 64-bit integer ID.
- The Problem: A single database auto-increment will bottleneck at high write volumes.
- The Solution: Use a Distributed ID Generator like Twitter Snowflake. It generates time-ordered unique IDs without a central coordinator.
4. The Redirection Path (Blazing Fast Reads)
The primary load on TinyURL is Redirection. This is a read-heavy operation.
- User hits
tinyurl.com/abc123. - The Load Balancer routes the request to a Web Server.
- The server checks Redis (Cache) for the long URL associated with
abc123. - If found: 301 Redirect immediately.
- If not found: Query the Database, populate the cache, and redirect.
5. Which Database?
- Requirement: High availability and horizontal scaling.
- The Choice: A NoSQL Key-Value store like DynamoDB or Cassandra is perfect. You only need a simple mapping:
short_id -> long_url.
6. HTTP 301 vs. 302 Redirection
- 301 (Permanent Redirect): The browser caches the redirect. Good for performance but makes analytics hard to track accurately.
- 302 (Temporary Redirect): The browser hits the TinyURL server every time. Better for real-time analytics and tracking.
Summary
Designing TinyURL is about efficiency. By using Base62 encoding for length and Redis for redirection speed, you can build a system that manages billions of links with minimal infrastructure overhead.
