System Design

Project Case Study: Designing YouTube (Video Streaming at Global Scale)

How does YouTube handle millions of video uploads and billions of views? A technical deep dive into Video Transcoding, CDNs, Adaptive Bitrate Streaming, and storage optimization.

Sachin Sarawgi·April 20, 2026·5 min read
#system-design#youtube#video-streaming#cdn#transcoding#scalability#architecture

Project Case Study: Designing YouTube

YouTube is one of the world's largest distributed systems, managing exabytes of data and serving billions of concurrent users. The technical challenge isn't just "storing a video"—it's the orchestration of a massive transcoding pipeline, a global content delivery network (CDN), and a highly available metadata layer that stays consistent under extreme load.

1. Requirements

Functional Requirements:

  • Video Upload: Users can upload videos up to 50GB.
  • Video Playback: Sub-second "Time to First Frame" for users globally.
  • Search: Instant full-text search across billions of titles and descriptions.
  • Interactions: Real-time view counts, likes, and comments.

Non-Functional Requirements (The Physics of the System):

  • Scale: 1 Billion Daily Active Users (DAU).
  • Throughput: 5 million new videos uploaded daily (~2.5 PB/day storage growth).
  • Availability: 99.99% (highly resilient to regional outages).
  • Read/Write Ratio: 100:1 (heavily read-optimized).
  • Latency: Minimal buffering even on high-latency mobile networks.

2. API Design

We use gRPC for internal service-to-service calls to minimize overhead, and REST for external client communication.

Upload API (Initiate)

POST /v1/videos/upload

  • Request: { title, description, category, filename, size }
  • Response: { upload_url, video_id } — Returns a Pre-signed S3 URL to allow the client to upload directly to storage, bypassing the application server.

Playback API

GET /v1/videos/{video_id}/stream?resolution=1080p&protocol=hls

  • Response: { manifest_url } — Returns a link to an .m3u8 or .mpd file containing chunk locations on the CDN.

3. High-Level Architecture (HLD)

The architecture is divided into three primary planes: The Ingestion Plane, The Big Data Plane, and The Delivery Plane.

  1. Load Balancer (GSLB): Routes traffic to the nearest regional data center using Anycast DNS.
  2. API Gateway: Handles authentication, rate limiting, and request routing.
  3. Metadata Service: Manages video info, user profiles, and comments.
  4. Transcoding Cluster: An asynchronous fleet of workers that converts raw uploads into multiple formats.
  5. CDN (Content Delivery Network): A distributed network of edge caches (PoPs) that store video segments close to users.

![YouTube HLD Diagram Placeholder]

4. Data Storage Design

  • Metadata Store: PostgreSQL with Vitess (Sharding). We shard by video_id. Relational integrity is required for comments and user permissions.
  • Video Blob Store: Amazon S3 or Google Cloud Storage. We use a "Cold Storage" tier (Glacier) for the original raw upload and "Hot Storage" for transcoded chunks.
  • Search Index: Elasticsearch. Video titles and descriptions are indexed to support fuzzy matching and ranking.
  • View Counts: Redis. To handle thousands of writes/sec on a viral video, we increment counts in-memory and flush to the DB in batches.

5. Deep Dive: The Engineering "Secret Sauce"

A. The Transcoding Pipeline (Detailed)

When a video is uploaded, it must be converted into multiple resolutions (144p to 8K) and codecs (H.264, VP9, AV1).

  1. Chunking: The raw file is split into 5-second GOP (Group of Pictures) chunks.
  2. Parallel Workers: Chunks are distributed across thousands of workers.
  3. DAG Execution: We use a Directed Acyclic Graph (DAG) to manage dependencies (e.g., "Don't start 4K transcoding until the 360p version is ready for instant preview").

B. Adaptive Bitrate Streaming (HLS/DASH)

YouTube doesn't send you one big file. It sends a Manifest File.

  • The client's player monitors network speed every 5 seconds.
  • If bandwidth drops, the player requests the next 5-second chunk in 480p instead of 1080p. This prevents the "Loading" spinner.

C. CDN Caching Strategy

  • Long Tail vs. Trending: 80% of traffic goes to 20% of videos.
  • Push-to-Edge: Viral videos are proactively pushed to all CDN PoPs.
  • LRU Eviction: Niche videos are only cached if requested, and evicted if not watched for 24 hours.

6. Scaling Challenges

  • The "Hot Video" Problem: When a video goes viral, the single database shard holding its metadata becomes a bottleneck.
    • Solution: Use Consistent Hashing and read-replicas. Cache metadata in a global Redis cluster with a very high TTL for viral IDs.
  • Upload Resiliency: If a 50GB upload fails at 99%, it's a disaster.
    • Solution: Multipart Uploads. The client uploads 5MB parts. Each part is acknowledged. On failure, the client only retries the missing parts.

7. Trade-offs

  • Cost vs. Latency: We store 10+ versions of every video. This is expensive (storage) but critical for user retention (latency).
  • Consistency vs. Performance: View counts are Eventually Consistent. If you refresh the page and see 1,005 views instead of 1,010, the system is working as intended to prioritize write throughput.

8. Real-world Insights

YouTube uses a custom filesystem called Colossus (successor to GFS) and a specialized database layer called Vitess to scale MySQL. Netflix, on the other hand, uses Open Connect, their own hardware appliances placed directly inside ISP data centers to bypass the public internet entirely.

Summary

The engineering of YouTube is a battle against the physics of the internet. By leveraging a massive transcoding pipeline and a global network of edge caches, you can build a system that delivers 4K video to millions of users simultaneously with zero lag.


Next: Designing a Global Payment Ledger Previous: Designing a Distributed ID Generator (Snowflake)


Related Prerequisite Guides

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon
Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon
Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Found this useful? Share it: