System DesignAdvancedarticle

System Design: Designing an Online Judge (LeetCode/HackerRank)

How does LeetCode run thousands of user code submissions safely and fast? A technical deep dive into Code Isolation, Sandboxing, and Distributed Judgement.

Sachin SarawgiApril 20, 20263 min read3 minute lesson

System Design: Designing an Online Judge

An Online Judge system like LeetCode, Codeforces, or HackerRank allows users to submit code, which is then executed and verified against hidden test cases. The core technical challenge is Security (running untrusted code) and Scalability (handling bursts of submissions during contests).

1. Core Requirements

  • Submission: Users can submit code in various languages (Java, Python, C++, etc.).
  • Execution: Running the code against multiple test cases.
  • Verification: Comparing the output with the expected result.
  • Limits: Enforcing Time Limits (TLE) and Memory Limits (MLE).
  • Security: Preventing the user code from accessing the host file system or network.

2. High-Level Architecture

  • Web API: Receives code and queues the submission.
  • Message Queue (Kafka): Buffers submissions to handle traffic spikes.
  • Judge Workers: Pull submissions from the queue and execute them in a secure sandbox.
  • Result Store (Postgres/Redis): Stores submission results and real-time status.

3. Code Isolation (The Security Heart)

Running untrusted user code on your server is dangerous. You must isolate it completely.

  • Option A: Docker Containers. A good starting point, but "Root" in a container can sometimes escape to the host.
  • Option B: gVisor (Google's choice). A user-space kernel that intercepts system calls, providing a much stronger security boundary than standard Docker.
  • Option C: Firecracker (AWS Lambda style). MicroVMs that provide hardware-level isolation with the speed of containers.

4. Enforcing Limits (Resource Management)

  • Time Limits: Use OS-level timers or cgroups to kill the process if it exceeds 1-2 seconds.
  • Memory Limits: Use cgroups to set a hard RAM limit (e.g., 256MB). If exceeded, the process is killed (MLE).
  • Fork Bombs: Limit the number of processes/threads the user code can create to prevent while(true) { fork(); } attacks.

5. Handling Test Cases

For a single problem, there might be 100+ test cases.

  • Input/Output Store: Store large test cases in Amazon S3 and cache them on the Judge Workers to avoid network latency for every submission.
  • Sequential vs. Parallel: To save time, run different test cases for the same submission in parallel across multiple threads/containers.

6. Real-time Status (WebSockets)

Users want to see "Running Test Case 5/100..." in real-time.

  • The Flow: The Judge Worker publishes progress to a Redis Pub/Sub channel. The Web API listens and pushes updates to the client via WebSockets.

Summary

The engineering of an Online Judge is about Defensive Programming. By using advanced sandboxing technologies like gVisor and robust resource management via cgroups, you can build a platform that allows the world to code on your servers with absolute safety and massive scale.

📚

Recommended Resources

Designing Data-Intensive ApplicationsBest Seller

The definitive guide to building scalable, reliable distributed systems by Martin Kleppmann.

View on Amazon
Kafka: The Definitive GuideEditor's Pick

Real-time data and stream processing by Confluent engineers.

View on Amazon
Apache Kafka Series on Udemy

Hands-on Kafka course covering producers, consumers, Kafka Streams, and Connect.

View Course

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Keep Learning

Move through the archive without losing the thread.

Related Articles

More deep dives chosen from shared tags, category overlap, and reading difficulty.

More in System Design

Category-based suggestions if you want to stay in the same domain.