System DesignAdvancedarticle

System Design: Designing a Distributed File System (HDFS/GCS Style)

How do you store petabytes of data across thousands of commodity nodes? A deep dive into Data Chunking, Namenode/Datanode architecture, and replication.

Sachin SarawgiApril 20, 20262 min read2 minute lesson

System Design: Designing a Distributed File System (HDFS)

A Distributed File System (like HDFS or GFS) is designed to store massive datasets across a cluster of commodity servers. It handles the complexity of breaking large files into chunks, replicating those chunks, and managing hardware failures without interrupting access.

1. Core Requirements

  • Massive Storage: Storing petabytes of data.
  • Reliability: Data must survive disk/node failures.
  • Sequential Access: Optimized for large, sequential file reads.
  • Throughput: High bandwidth for batch processing (MapReduce/Spark).

2. High-Level Architecture

  • NameNode (Metadata Server): Keeps track of file structure (namespaces), which chunks make up each file, and where those chunks reside. It's the "brain."
  • DataNodes (Storage Servers): The physical machines where the actual file chunks are stored. They handle read/write requests from clients.

3. Data Chunking & Replication

  • Chunking: Files are broken into fixed-size chunks (e.g., 128MB). This size is intentionally large to reduce metadata overhead and optimize sequential reads.
  • Replication: Each chunk is replicated (usually 3 times). To ensure fault tolerance, replicas are placed in different racks/AZs so that a single rack failure doesn't lose the entire chunk.

4. The NameNode Bottleneck

The NameNode stores metadata in memory to keep lookups fast. But what if we have billions of files?

  • Federation: Split the file system into multiple namespaces, each managed by its own NameNode.
  • Caching: Frequently accessed metadata is cached, but the NameNode remains the absolute source of truth.

5. Heartbeats and Failure Recovery

  • Heartbeats: DataNodes send heartbeats to the NameNode every 3-5 seconds.
  • Recovery: If the NameNode stops receiving heartbeats from a DataNode, it marks all chunks on that node as "under-replicated" and commands other healthy nodes to create new replicas immediately.

Summary

Building a distributed file system is about Abstraction. By separating metadata management (NameNode) from raw storage (DataNodes) and using massive block sizes, you can build a storage engine that scales to thousands of nodes and petabytes of data while appearing as a single, simple directory structure to the user.

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

More in System Design

Category-based suggestions if you want to stay in the same domain.