Project Case Study: Designing YouTube (Video Streaming at Global Scale)

Case Study: Design YouTube (Video Streaming)

Designing a video streaming platform like YouTube or Netflix is a masterclass in handling high throughput, petabyte-scale storage, high-bandwidth egress, and asynchronous distributed computation. In a system design interview, this problem evaluates your ability to balance cost, performance, and latency under the constraints of unstable client internet connections and high-volume media processing.

1. Requirements & Core Constraints

To architect a video platform at global scale, we must first establish concrete boundaries. We assume the system is designed to handle 2 Billion monthly active users (MAU) and 1 Billion Daily Active Users (DAU).

Functional Requirements

Resumable Video Upload: Content creators can upload high-quality raw video files. Due to large file sizes, uploads must support chunked, resumable transfers to handle flaky network connections.
High-Quality Playback: Users can stream video in real-time. The system must adaptively adjust the video quality (Adaptive Bitrate Streaming) based on the user's instantaneous network bandwidth.
Metadata Search: Users can search for videos by title, description, or tags with sub-second retrieval.
Analytics & Views: The system must track view metrics, likes, and dislikes in real-time, providing public counters without overwhelming databases.

Non-Functional Requirements (SLAs)

Low Startup Latency (TTFF): The Time-to-First-Frame (TTFF) or video playback startup latency must be under 200ms globally.
High Availability: Playback systems must achieve 99.99% availability. Upload processing is asynchronous and can target a lower availability of 99.9%.
No Data Loss (Durability): Uploaded source raw videos must be preserved with 99.999999999% (11 nines) durability.
Stale Views Acceptance: Video view counts are allowed to be eventually consistent; exact synchronized counting is not required globally in real-time.

Back-of-the-Envelope Capacity Estimates

To size our storage and egress networks, we perform a core traffic computation:

DAU: 1,000,000,000 users.
Average Views Per User: 5 videos daily.
Total Playback Requests: 5 Billion views per day.
Daily Upload Volume: Assume 500 hours of video are uploaded every minute.
Average Video Bitrate (Transcoded): 3 Mbps (composite average across 360p, 720p, 1080p).
Average Raw Upload Bitrate: 20 Mbps (high-quality raw H.264/AVC or ProRes source).

Storage Calculations:

Daily raw footage uploaded: $$\text{500 hours/min} \times 60 \text{ mins/hour} \times \text{3600 seconds/hour} \times \text{20 Mbps} \approx \text{135 Terabytes (TB) per minute raw}.$$ Over 24 hours: $$135 \text{ TB/min} \times 1440 \text{ mins} \approx \text{194 Petabytes (PB) per day raw}.$$ Accounting for multiple transcoded resolutions (240p, 360p, 720p, 1080p, 4K) and distinct codecs (AV1, VP9, H.264) which multiply storage by roughly $4\times$, we require 776 PB of new storage per day. Over 5 years, this scales to 1.41 Exabytes of transcoded storage, requiring aggressive cold-storage tiering.

Egress Bandwidth Calculations:

Total playback bandwidth egress: $$5,000,000,000 \text{ views/day} \times 5 \text{ minutes/view} \times 60 \text{ seconds/minute} \times 3 \text{ Mbps} \approx 4.5 \text{ Exabits per day}.$$ Converting to aggregate network throughput: $$\frac{4.5 \text{ Exabits}}{86,400 \text{ seconds}} \approx \mathbf{52 \text{ Terabits per second (Tbps)}} \text{ continuous aggregate egress.}$$ This volume makes it impossible to serve directly from a single data center; we must utilize a highly distributed, multi-tiered Content Delivery Network (CDN).

2. API Design & Core Contracts

At our scale, clear separation of concerns is managed via explicit REST and gRPC API contracts. We outline the critical endpoints below.

A. Resumable Upload API

Large raw media files are uploaded in chunks. Rather than a naive single post, a three-step protocol handles upload sessions.

1. Initiate Upload Session

POST /v1/uploads/sessions
Authorization: Bearer <JWT_TOKEN>
Content-Type: application/json

{
  "title": "Scaling Distributed Ledgers",
  "description": "Deep dive into idempotency and double-entry bookkeeping",
  "category_id": 4,
  "video_size_bytes": 4294967296,
  "raw_md5_checksum": "9e107d9d372bb6826bd81d3542a419d6"
}

Response:

{
  "upload_session_id": "sess_98234791823749",
  "chunk_size_bytes": 10485760,
  "next_byte_offset": 0,
  "upload_url": "https://upload.csp.io/chunks/sess_98234791823749"
}

2. Stream Upload Chunk

PATCH /chunks/sess_98234791823749
Content-Range: bytes 0-10485759/4294967296
Content-Type: application/octet-stream

<Binary Byte Stream>

Response:

{
  "upload_session_id": "sess_98234791823749",
  "next_byte_offset": 10485760,
  "status": "IN_PROGRESS"
}

B. Playback Manifest Request (HLS/DASH)

Clients request a streaming manifest file rather than a raw mp4 link. The manifest outlines the available chunk resolutions, bitrates, and paths.

GET /v1/videos/vid_834729/manifest?client_network_bitrate=5000000
Accept: application/x-mpegURL

Response Manifest (master.m3u8):

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-INDEPENDENT-SEGMENTS

# Extreme Bandwidth Option (1080p at 4.5 Mbps)
#EXT-X-STREAM-INF:BANDWIDTH=4500000,RESOLUTION=1920x1080,CODECS="av01.0.08M.10",AUDIO="audio-high"
https://cdn.csp.io/vid_834729/1080p/manifest.m3u8

# Medium Bandwidth Option (720p at 2.2 Mbps)
#EXT-X-STREAM-INF:BANDWIDTH=2200000,RESOLUTION=1280x720,CODECS="vp09.02.10.10",AUDIO="audio-medium"
https://cdn.csp.io/vid_834729/720p/manifest.m3u8

# Low Bandwidth Option (360p at 800 Kbps)
#EXT-X-STREAM-INF:BANDWIDTH=800000,RESOLUTION=640x360,CODECS="avc1.64001f",AUDIO="audio-low"
https://cdn.csp.io/vid_834729/360p/manifest.m3u8

3. High-Level Design (HLD)

Our high-level architecture separates the Write (Upload/Ingestion) Path and the Read (Streaming/Playback) Path to guarantee that massive transcoding workloads never starve playback requests of network resources.

graph TD
    %% Write Ingestion Flow
    Creator((Creator)) -->|1. Resumable Upload| LB1[Ingress Load Balancer]
    LB1 -->|2. Route Upload| UploadSrv[Upload Service]
    UploadSrv -->|3. Store Raw Chunk| TempStorage[(Object Storage Raw S3)]
    UploadSrv -->|4. Push Job Event| IngestionQueue[Kafka Ingestion Topic]
    
    %% Transcoding Pipeline
    IngestionQueue -->|5. Consumer Trigger| TransCoord[Transcoding Coordinator]
    TransCoord -->|6. Orchestrate Task DAG| WorkerFleet[Transcoding Worker Fleet]
    WorkerFleet -->|Read Raw Chunks| TempStorage
    WorkerFleet -->|7. Write Multi-Resolution Segments| ProductionStorage[(Object Storage Production S3)]
    WorkerFleet -->|8. Register Chunks| ChunkRegistry[(ScyllaDB / Cassandra)]
    
    %% Playback Read Path
    Viewer((Viewer)) -->|9. Read Video Request| CDN[Global Multi-Tier CDN Edge]
    CDN -->|10. Cache Miss Manifest Request| LB2[Egress Load Balancer]
    LB2 -->|11. Route Manifest Request| ManifestSrv[Manifest Generator Service]
    ManifestSrv -->|12. Fetch Metadata| Cache[Redis Cache Cluster]
    ManifestSrv -->|13. Fetch Chunks & Codec Details| ProductionStorage
    Cache -->|Cache Miss| MetadataDB[(Spanner / Sharded PostgreSQL)]
    
    %% Subsystem Links
    TransCoord -->|Write Video Status| MetadataDB
    CDN -->|Pull Video Segments| ProductionStorage

Video Ingestion & Processing Pipeline (DAG Executor)

The transcoding pipeline operates as a distributed Directed Acyclic Graph (DAG) manager. Because a 10GB video takes hours to process sequentially, the upload server partitions raw video streams into precise 5-second segments.

graph LR
    RawFile[Raw Uploaded Video] --> Segmenter[Raw Video Segmenter]
    
    %% Split segments into parallel processing streams
    Segmenter -->|Chunk 01| Job1[Transcoder Worker 01]
    Segmenter -->|Chunk 02| Job2[Transcoder Worker 02]
    Segmenter -->|Chunk N| JobN[Transcoder Worker N]
    
    %% Dynamic Chunk Codec Output
    Job1 -->|H.264/VP9/AV1| ChunkStore[(Production S3)]
    Job2 -->|H.264/VP9/AV1| ChunkStore
    JobN -->|H.264/VP9/AV1| ChunkStore
    
    %% Convergence & Compilation
    ChunkStore --> ManifestBuilder[Manifest Aggregator]
    ManifestBuilder -->|Update Manifest| MetadataDB[(Metadata Database)]
    ManifestBuilder -->|Pre-warm Hot Edges| CDNPusher[CDN Edge Warm Cache]

4. Low-Level Design (LLD) & Data Models

Database Selection Rationale

Metadata Store (Spanner / Sharded PostgreSQL): Crucial for strict ACID compliance. Users must see immediate consistency upon changing video titles or privacy settings. We shard PostgreSQL tables using the user_id partition key, ensuring all data related to a content creator (videos, channel metrics) resides on a single physical shard to eliminate cross-shard joins.
Chunk Registry (ScyllaDB / Cassandra): Handles high-throughput writes. The transcoding workers write billions of rows containing metadata for every 5-second video segment (chunk location, MD5 hash, byte size). A NoSQL wide-column store handles this write-heavy payload with linear scaling.
Cache Store (Redis Cluster): Stores high-traffic video metadata and serialized manifest layouts, eliminating database hits for viral content.

Low-Level SQL Schema (Metadata Store)

-- Core User/Channel Table
CREATE TABLE users (
    user_id VARCHAR(64) PRIMARY KEY,
    channel_name VARCHAR(128) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    is_premium BOOLEAN DEFAULT FALSE
);

-- Video Table sharded by user_id to ensure single-shard locality for creator operations
CREATE TABLE videos (
    video_id VARCHAR(64) NOT NULL,
    user_id VARCHAR(64) NOT NULL,
    title VARCHAR(255) NOT NULL,
    description TEXT,
    view_count BIGINT DEFAULT 0,
    like_count BIGINT DEFAULT 0,
    status VARCHAR(32) NOT NULL DEFAULT 'UPLOADING', -- UPLOADING, PROCESSING, READY, FAILED
    duration_seconds INT NOT NULL,
    raw_md5 VARCHAR(32) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, video_id)
);

CREATE INDEX idx_video_status ON videos(status);

Cassandra/ScyllaDB Wide-Column Schema (Chunk Registry)

-- Optimized for fast retrieval of ordered video chunks for manifest generation
CREATE KEYSPACE video_curriculum 
WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east': 3, 'us-west': 3};

CREATE TABLE video_curriculum.video_chunks (
    video_id text,
    resolution text,      -- '1080p', '720p', '360p'
    codec text,           -- 'av1', 'vp9', 'h264'
    chunk_index int,      -- 0, 1, 2, 3...
    chunk_url text,       -- CDN storage endpoint link
    byte_offset bigint,   -- Start byte in the combined container
    chunk_size int,       -- Size of this segment
    chunk_md5 text,
    PRIMARY KEY ((video_id), resolution, codec, chunk_index)
) WITH CLUSTERING ORDER BY (resolution ASC, codec ASC, chunk_index ASC);

5. Scaling Challenges & Bottlenecks

A. The Viral Video & Celebrity Problem

When a highly anticipated video goes viral, millions of viewers hit the exact same CDN edge locations simultaneously. This leads to Thundering Herd effects and cache edge storage saturation.

Mitigation Strategies:

Active CDN Shielding & Origin Ingress Throttling: A dedicated proxy shield cache layer sits between the edge nodes and the primary S3 storage. If a million edge nodes experience a cache miss for the same chunk, they query the Shield node. The Shield uses request collapsing (via mutex locks) to issue exactly one fetch request to the underlying S3 origin.
Dynamic Segment Sharding & Multi-CDN Routing: We split traffic across multiple independent CDNs (Akamai, Cloudflare, Fastly, AWS CloudFront). If an edge POP in region A saturates, our Global Traffic Manager (GTM) dynamically swaps the manifest domain hostnames in real-time, redirecting client manifest requests to a secondary CDN provider.

B. Adaptive Bitrate Streaming (ABR) Logic

ABR uses a client-driven heuristic algorithm to pull the correct video segments dynamically. We utilize hybrid client-side algorithms like BOLA (Buffer-Occupancy Based Lyapunov Algorithm).

If Client Buffer > 20 seconds: Client requests the maximum available resolution (1080p or 4K) from the manifest.
If Network Bandwidth drops (< 1.5 Mbps): Client immediately requests the next 5-second chunk at 360p, preventing a loading spinner. The user experiences a transient dip in visual fidelity, but playback continuity remains unbroken.

6. Real-World Trade-offs

A. Storage vs. Computation: Pre-Transcoding vs. On-Demand

Strategy A: Complete Pre-Transcoding: Transcoding all uploaded videos immediately into all resolutions (240p up to 4K) and all codecs (H.264, VP9, AV1).
- Trade-off: High storage costs. Over 90% of YouTube uploads are long-tail videos that get fewer than 10 views total. Pre-transcoding everything in 15 different profiles wastes petabytes of storage.
Strategy B: Lazy On-Demand Transcoding: Transcoding chunks dynamically when requested.
- Trade-off: Unacceptable TTFF latency and extremely expensive real-time compute bills.
Our Hybrid Resolution: When a video is uploaded, we immediately perform Fast-Path Transcoding generating only H.264 at 360p and 720p. The video is marked ready. If the video view count crosses 1,000 views within 24 hours, it is pushed to the Cold-Path Deep Transcoding pipeline to generate optimal H.265, VP9, and AV1 high-efficiency streams to save egress bandwidth.

B. Egress Network Cost vs. Transcoding CPU Costs

AV1 codec reduces file sizes by 30% compared to VP9 and 50% compared to H.264/AVC, but it requires up to $10\times$ more CPU power to transcode.

Trade-off analysis: Bandwidth egress billing represents over 70% of a massive streaming system's operating expenses. Investing upfront compute costs to transcode into AV1 saves millions of dollars in monthly internet egress fees. Therefore, for highly-trafficked videos, the heavy computational spend is highly optimal.

7. Failure Scenarios & Fault Tolerance

A. Transcoding Worker Failures

If a worker node crashes mid-transcode of a video chunk:

Solution: Workers maintain a heartbeat with the Kafka broker and the distributed Transcoding Coordinator. If a worker goes silent for more than 10 seconds, its database reservation is revoked. The segment event is re-queued in Kafka. Because transcoding operations are fully idempotent (the output chunk is deterministic based on the raw segment bytes and target codec), this failure recovery yields zero visual anomalies.

B. Resumable Upload Timeout & Consistency

If a user loses cell reception mid-upload:

Solution: The upload client queries the status via GET /v1/uploads/sessions/sess_98234791823749. The server responds with next_byte_offset: 10485760. The client resumes exactly from byte offset 10,485,760, avoiding redundant uploads and reducing cellular network overhead.

8. Staff Engineer Perspective (Operational Deep Dive)

9. Candidate Verbal Script (Mock Interview Guide)

Below is a verbatim verbal response demonstrating how a senior candidate should walk through this design during a live interview session:

Candidate: "To design YouTube, I will start by clarifying the primary operational bottlenecks. Since we are dealing with 1 Billion Daily Active Users, the egress network bandwidth is our absolute highest cost and primary constraint. I've calculated that we need to serve roughly 52 Terabits per second of continuous streaming bandwidth globally. This dictates a heavy decoupling of the write upload path and the read playback path.

For the upload path, I want to avoid naive file streaming. I will implement a resumable chunked upload API using standard HTTP PATCH calls and MD5 checksum validations. Once the raw video chunk is safely written to S3, the Upload Service writes an event to a Kafka broker. This isolates ingestion traffic. The transcoding pipeline will pull these events and partition the raw footage into 5-second physical segments. A fleet of GPU-enabled worker instances will execute a DAG to transcode these segments in parallel.

For data modeling, I will keep metadata in sharded PostgreSQL, partitioned by user_id to maximize write locality. However, to track the billions of individual transcoded chunks, I'll use ScyllaDB with a clustering key ordered by resolution, codec, and chunk index. This schema guarantees that the Manifest Service can fetch all chunk locations with a single primary-key lookup query, returning the HLS master .m3u8 playlist in under 10ms.

Finally, to deal with hot viral videos, I will introduce a two-tiered caching system. A dedicated CDN shield layer will aggregate all chunk misses and collapse them into a single read command to the underlying S3 storage. This completely shields the origin database and raw object stores from melting during global trending events."