System Design: Video Streaming Platform at Netflix Scale

Video Streaming Platform at Netflix Scale

Video streaming accounts for a massive portion of global internet traffic. At peak demand, a scale-out video streaming platform must support over 250 million concurrent streams—each stream adapting in real-time to shifting client network conditions, with segments served from edge caches placed less than 20ms away from the viewer.

The architecture behind such systems must handle massive computational requirements for video transcoding, highly distributed storage strategies, smart caching hierarchies, and client-side players capable of seamless, jitter-free playback.

An important architectural principle for media platforms is the extreme asymmetry of the workload: encode once, serve billions of times. The ingestion pipeline is a highly complex, compute-heavy write path, while the delivery network is a globally replicated, read-heavy path. Keeping these two paths decoupled is the key to scaling the system cost-effectively.

Requirements and System Goals

To design a production-grade video streaming platform, we must satisfy strict functional and non-functional requirements:

Functional Requirements

Video Uploading and Ingestion: Allow creators to upload raw high-resolution video files (up to 4K resolution) in multiple file formats.
Distributed Transcoding: Transcode uploaded videos into multiple resolutions (360p, 480p, 720p, 1080p, 4K) and formats (HLS, DASH) to support various target devices.
Adaptive Bitrate Streaming (ABR): Automatically adjust the streaming video quality in real-time based on the user's current network bandwidth.
Resume Playback Across Devices: Track and persist user playback progress (timestamp offsets) across multiple client devices.
Personalized Recommendations: Provide real-time feed recommendations and next-up videos based on user viewing history.

Non-Functional Requirements

Low Startup Latency: The p99 video start latency (time between clicking "Play" and the first frame rendering) must be less than 2.0 seconds.
Buffer-Free Playback: Maintain a rebuffering rate of less than 0.5% (minimizing pauses during playback).
Cost-Efficient Storage: Implement tiered storage models to manage petabytes of historical long-tail video content economically.
Global Availability: High Availability (99.999% uptime) for metadata and playback authorization services, ensuring viewers can always initiate streams.

API Interfaces and Service Contracts

The streaming platform relies on clean service boundaries separating metadata operations from raw media transport.

graph TD
    Client[Client Video Player] -->|1. GET /api/v1/videos/{id}/manifest| Gateway[API Gateway]
    Gateway -->|2. Resolve Session & Manifest| Metadata[Metadata Service]
    Client -->|3. GET /segments/{id}_{res}_{seq}.ts| CDN[CDN Edge Nodes]

1. Retrieve Video Playback Manifest

When a user clicks play, the client requests the master manifest. This file acts as the index pointing the player to the available video qualities and chunk lists.

Endpoint: GET /api/v1/videos/{videoId}/manifest
Headers:
- Authorization: Bearer <JWT_TOKEN>
- X-Device-Type: SmartTV

Response Payload:

{
  "video_id": "vid_payment_reconciliation_99",
  "title": "System Design: Building a Payment Reconciliation Engine",
  "playback_session_id": "sess_883a01f92",
  "duration_seconds": 7200,
  "master_playlist_url": "https://cdn.codesprintpro.com/media/vid_99/master.m3u8",
  "subtitles": [
    { "language": "en", "url": "https://cdn.codesprintpro.com/media/vid_99/subs_en.vtt" },
    { "language": "es", "url": "https://cdn.codesprintpro.com/media/vid_99/subs_es.vtt" }
  ],
  "audio_tracks": [
    { "codec": "aac", "channels": "stereo", "url": "https://cdn.codesprintpro.com/media/vid_99/audio_stereo.m3u8" },
    { "codec": "ec-3", "channels": "5.1_surround", "url": "https://cdn.codesprintpro.com/media/vid_99/audio_surround.m3u8" }
  ]
}

2. Playback Progress Synchronization

To support resuming playback across different devices, the client player reports its current offset position periodically (e.g., every 10 seconds).

Endpoint: POST /api/v1/playback/progress
Request Payload:

{
  "video_id": "vid_payment_reconciliation_99",
  "playback_session_id": "sess_883a01f92",
  "offset_seconds": 1284.5,
  "device_id": "dev_mobile_apple_iphone15",
  "reported_at": "2026-06-05T12:45:30Z"
}

Response: 200 OK

High-Level Design and Visualizations

To scale video ingestion and delivery separately, the system is split into two asynchronous paths: the Transcoding and Storage Pipeline (write path) and the CDN Playback Pipeline (read path).

Transcoding and Storage Pipeline

This pipeline handles raw video ingestion, chunk segmentation, parallel encoding, manifest generation, and CDN registration.

sequenceDiagram
    autonumber
    participant Creator as Content Creator
    participant Ingest as Ingest & Upload Service
    participant S3Raw as S3 Raw Storage Bucket
    participant SQS as Transcoding Job Queue (SQS)
    participant Worker as Transcoding Workers Fleet (EC2 Spot)
    participant S3Pub as S3 Public CDN Origin
    participant DB as Postgres Metadata DB

    Creator->>Ingest: Upload Raw Video File (H.264, 4K, 50 GB)
    Ingest->>S3Raw: Stream multi-part upload chunks
    S3Raw-->>Ingest: Upload complete acknowledgement
    Ingest->>DB: Insert video record (Status: PENDING)
    Ingest->>SQS: Push segmentation jobs (Split video into 6s raw chunks)
    Note over Worker: Workers fetch chunks & execute parallel FFmpeg encodes
    Worker->>S3Pub: Upload encoded TS segments (360p, 720p, 1080p, 4K)
    Worker->>S3Pub: Upload Master Manifest (.m3u8 index file)
    Worker->>DB: Update video record (Status: READY, manifest_url)

Playback and ABR Adaptive Bitrate Loop

Adaptive Bitrate (ABR) shifts the responsibility of quality switching to the client player, preventing server bottlenecks.

sequenceDiagram
    autonumber
    participant Player as Client ABR Player
    participant Edge as CDN Edge Server
    participant Origin as S3 Origin Bucket

    Player->>Edge: Request Master Manifest (master.m3u8)
    Edge-->>Player: Return Master Playlist with available qualities
    Note over Player: Start streaming segment 0 at 360p (fast loading)
    Player->>Edge: GET /media/vid_99/360p/segment_0.ts
    Edge-->>Player: Return segment 0 bytes (render immediately)
    Note over Player: Telemetry check: Segment downloaded in 500ms (threshold is 6000ms)<br/>Bandwidth is high. Upgrade quality!
    Player->>Edge: GET /media/vid_99/1080p/segment_1.ts
    alt CDN Cache Hit
        Edge-->>Player: Return segment 1 at 1080p from edge RAM (5ms)
    else CDN Cache Miss
        Edge->>Origin: Fetch /media/vid_99/1080p/segment_1.ts from S3 (150ms)
        Origin-->>Edge: Return bytes & cache at edge
        Edge-->>Player: Return segment 1 to player
    end

Low-Level Design and Schema Strategies

To support distributed transcoding and fast playback resuming, our metadata database must be highly optimized for fast read lookups.

Database Schema (PostgreSQL)

-- Core Video metadata record
CREATE TABLE videos (
    video_id VARCHAR(64) PRIMARY KEY,
    title VARCHAR(512) NOT NULL,
    description TEXT,
    duration_seconds INT NOT NULL,
    raw_s3_key VARCHAR(1024) NOT NULL,
    master_manifest_url VARCHAR(1024),
    status VARCHAR(32) NOT NULL, -- 'PENDING', 'SEGMENTED', 'TRANSCODING', 'READY', 'FAILED'
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Transcoding job tracking table
CREATE TABLE transcoding_jobs (
    job_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    video_id VARCHAR(64) REFERENCES videos(video_id) ON DELETE CASCADE,
    resolution VARCHAR(16) NOT NULL, -- '360p', '480p', '720p', '1080p', '4K'
    bitrate_bps INT NOT NULL,
    status VARCHAR(32) NOT NULL, -- 'QUEUED', 'PROCESSING', 'COMPLETED', 'FAILED'
    error_log TEXT,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Active User Playback progress tracker (Highly queried)
CREATE TABLE playback_progress (
    user_id VARCHAR(64) NOT NULL,
    video_id VARCHAR(64) REFERENCES videos(video_id) ON DELETE CASCADE,
    offset_seconds NUMERIC(10, 2) NOT NULL,
    device_id VARCHAR(128) NOT NULL,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (user_id, video_id)
);

-- Index user progress lookups for rapid "Resume Watching" feed rendering
CREATE INDEX idx_playback_user_time ON playback_progress (user_id, updated_at DESC);

Low-Level HLS Master Manifest Schema (.m3u8)

The master playlist details the available stream configurations. The player uses this index file to select the appropriate stream based on current bandwidth:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-INDEPENDENT-SEGMENTS

# 360p Low Quality
#EXT-X-STREAM-INF:BANDWIDTH=500000,AVERAGE-BANDWIDTH=450000,RESOLUTION=640x360,CODECS="avc1.4d401f,mp4a.40.2"
https://cdn.codesprintpro.com/media/vid_99/360p/playlist.m3u8

# 720p Medium Quality
#EXT-X-STREAM-INF:BANDWIDTH=2500000,AVERAGE-BANDWIDTH=2200000,RESOLUTION=1280x720,CODECS="avc1.4d401f,mp4a.40.2"
https://cdn.codesprintpro.com/media/vid_99/720p/playlist.m3u8

# 1080p High Quality
#EXT-X-STREAM-INF:BANDWIDTH=5000000,AVERAGE-BANDWIDTH=4400000,RESOLUTION=1920x1080,CODECS="avc1.64002a,mp4a.40.2"
https://cdn.codesprintpro.com/media/vid_99/1080p/playlist.m3u8

# 4K Ultra Quality (Uses HEVC)
#EXT-X-STREAM-INF:BANDWIDTH=25000000,AVERAGE-BANDWIDTH=22000000,RESOLUTION=3840x2160,CODECS="hev1.1.6.L150.B0,mp4a.40.2"
https://cdn.codesprintpro.com/media/vid_99/4k/playlist.m3u8

Scaling and Operational Challenges

Managing media distribution at a scale of 250 million daily streams requires resolving massive data egress costs and storage footprints.

Egress Bandwidth and Codec Cost Calculations

Let us calculate the network bandwidth egress requirements and cost implications when streaming 1 hour of video to 10,000,000 (10 Million) viewers simultaneously using different video codecs.

Let:

$V$ = Number of viewers = $10,000,000$ viewers.
$D$ = Video duration = $3600 \text{ seconds}$ (1 hour).
Codec Bitrate Requirements (1080p High Quality):
- H.264 (AVC): Average bitrate $B_{\text{h264}} = 5.0 \text{ Mbps}$ (Megabits per second).
- HEVC (H.265) / AV1: Average bitrate $B_{\text{hevc}} = 3.0 \text{ Mbps}$ (representing a 40% compression gain at matching visual quality).

First, calculate the total data size consumed by a single viewer for 1 hour using H.264:

$$\text{Data}_{\text{h264}} = \frac{5,000,000 \text{ bits/sec} \times 3600 \text{ sec}}{8 \text{ bits/byte}} = 2,250,000,000 \text{ bytes} \approx 2.25 \text{ GB}$$

Total egress data for 10M viewers using H.264:

$$\text{Egress}_{\text{h264}} = 10,000,000 \times 2.25 \text{ GB} = 22,500,000 \text{ GB} = 22.5 \text{ Petabytes}$$

Now, calculate the total data size consumed by a single viewer using the optimized HEVC/AV1 codec:

$$\text{Data}_{\text{hevc}} = \frac{3,000,000 \text{ bits/sec} \times 3600 \text{ sec}}{8 \text{ bits/byte}} = 1,350,000,000 \text{ bytes} \approx 1.35 \text{ GB}$$

Total egress data for 10M viewers using HEVC/AV1:

$$\text{Egress}_{\text{hevc}} = 10,000,000 \times 1.35 \text{ GB} = 13,500,000 \text{ GB} = 13.5 \text{ Petabytes}$$

Bandwidth and Cost Savings Analysis

Data Savings: $22.5 \text{ PB} - 13.5 \text{ PB} = 9.0 \text{ Petabytes}$ saved per hour.
Network Egress Savings: Assuming a bulk CDN data transfer cost of $0.01 per GB:
- Cost using H.264: $22,500,000 \text{ GB} \times $0.01 = $225,000$ per hour.
- Cost using HEVC: $13,500,000 \text{ GB} \times $0.01 = $135,000$ per hour.
- Savings: $90,000 per hour of high-traffic streaming.

This math demonstrates why investing heavy compute time in optimized codecs (like AV1 and HEVC) is a core economic requirement for media platforms, easily offsetting the initial CPU encoding costs.

Trade-offs and Architectural Alternatives

When choosing video streaming formats and encoding setups, organizations must evaluate key trade-offs:

Dimension	HTTP Live Streaming (HLS)	MPEG-DASH	Just-in-Time (On-Demand) Transcoding
Protocol Design	Developed by Apple. Uses `.m3u8` index formats and TS/fMP4 segments.	International standard. Uses XML-based `.mpd` manifests.	Transcodes video segments dynamically on-the-fly when requested.
Device Compatibility	Excellent (Native support on iOS, macOS, AppleTV, Safari; supported on Android).	Medium (Standard on Android, Smart TVs, Chrome/Firefox; lacks native iOS Safari support).	Same as target protocol.
Storage Overhead	High (Requires storing pre-encoded segments for all resolutions).	High (Requires storing pre-encoded segments for all resolutions).	Ultra-Low (Only store the high-resolution source file; no pre-encoded segments).
Compute Overhead	Low (Compute cost is paid once during creator upload phase).	Low (Compute cost is paid once during creator upload phase).	High (Massive CPU/GPU load during concurrent stream views).
Latency to Play	Medium (Typically requires pre-buffered segments).	Medium (Typically requires pre-buffered segments).	High (Initial chunk generation introduces a startup latency tax).

Failure Modes and Fault Tolerance Strategies

1. Spot Instance Preemption during Transcoding

Using AWS EC2 Spot instances for encoding saves 90% in compute costs, but AWS can reclaim these instances with a 2-minute warning.

Resolution Strategy: Use Message Visibility Timeouts in SQS. When a worker grabs a segment transcoding job from the queue, SQS hides it from other workers for 10 minutes. If the worker is preempted, it fails to send a delete callback. After 10 minutes, the message naturally reappears in the queue and is picked up by another worker instance.

2. The CDN Cache Stampede

When a highly anticipated show drops (e.g., a season premiere), millions of clients request the first segment simultaneously. If the segment is not cached yet, millions of edge requests will pass through to the S3 origin database concurrently, crashing it.

Resolution Strategy:
- Origin Shielding: Configure a centralized cache layer between edge nodes and the S3 origin.
- Mutex Locking (Single Flight): The edge node locks the cache key on a miss. Only the first request goes to the origin. Subsequent requests block and wait for the cache key to be populated, protecting the origin S3 bucket.

3. Client Network Drop Adaptation

If a user goes under a tunnel and their bandwidth drops from 50 Mbps to 1 Mbps, the player must not freeze.

Resolution Strategy: Client-side Look-Ahead Buffering. The player buffer holds 3 to 4 segments (18-24 seconds of video) in advance. When telemetry registers a download latency spike, the player immediately drops the next segment request to 360p, sliding down the quality scale smoothly before the local buffer empties.

Staff Engineer Perspective

Per-Title Encoding Optimization

Early video platforms used a static encoding ladder: a 1080p file was always encoded at 5.0 Mbps. However, a scene showing a talking head against a static background requires far fewer bits to look perfect compared to a high-speed action scene or a football match.

Strategy: Implement Per-Title/Per-Scene Encoding. The ingestion worker runs a analysis pass on the video, calculating complexity. A talking head interview might be encoded at 1080p with only 1.8 Mbps, while an action movie gets 5.5 Mbps. This dynamically optimizes global CDN bandwidth and storage by up to 50% without reducing perceived user quality.

Open-Connect Edge Caching ISP Hardware

For global operations, relying on commercial CDNs is too expensive. To scale economically, you must build custom caching appliances (similar to Netflix's Open Connect program):

Platform engineers build custom 1U/2U server boxes packed with high-speed SSDs and flash storage containing the entire popular video catalog.
These hardware appliances are colocated directly inside the datacenters of local Internet Service Providers (ISPs) worldwide for free.
When a customer requests a segment, the traffic never leaves the ISP's local loop network, cutting transport latency to less than 10ms and reducing public transit network fees.

Verbal Script

Interviewer: "How would you design a scalable video streaming platform like Netflix, and how do you optimize startup latency?"

Candidate:

"To design a video streaming platform at this scale, I would decouple the write-heavy ingestion and transcoding pipeline from the read-heavy delivery network.

On the ingestion path, when a creator uploads a raw video, the ingest service streams it directly to an S3 raw bucket. An orchestrator splits the video into 6-second segments. We send transcoding jobs to an SQS FIFO queue, where a fleet of worker processes running on cost-efficient EC2 Spot instances transcode the segments in parallel. The segments are encoded into multiple resolutions (360p, 720p, 1080p, 1080p HEVC, and 4K) and formats (HLS/DASH), and stored in a public CDN origin bucket. The worker then generates the Master Manifest index file and updates the metadata database.

To optimize startup latency and achieve a p99 startup time of less than 2 seconds, I would implement several layered techniques:

Low-Quality Initial Segment: The player is configured to always request the lowest quality segment (360p) first. This segment is small (typically less than 400 KB) and downloads in under 100ms, rendering the first frame immediately while the ABR algorithm evaluates bandwidth and upgrades subsequent segments.
Manifest Prefetching: In the client app, when a user hovers over a video thumbnail, the player initiates a background prefetch of the master manifest. This removes the manifest HTTP round-trip latency from the playback initiation sequence.
ISP Edge Caching: We push popular content segments to edge nodes. For large-scale operations, we place custom hardware caching appliances directly inside local ISP datacenters. When the user plays a video, the segments are served directly from their ISP's local network namespace, keeping packet transit latency under 15 milliseconds."