Case Study: Design Tic Tac Toe

Designing Tic Tac Toe is a classic Machine Coding problem. While a naive solution with raw nested loops is simple, architecting it to be highly scalable, extensible (supporting $N \times N$ grids and varying win conditions), thread-safe, and optimizing the win-condition check to $O(1)$ execution time is what distinguishes a senior staff engineer from a junior developer.

In this comprehensive guide, we will design a production-ready, SOLID-compliant Tic Tac Toe game system. We will walk through the core entities, the mathematics behind the $O(1)$ win checking optimization, concurrency handling, sparse memory configurations, and distributed server scaling patterns.

System Requirements and Goals

To build a professional board game engine, we must define strict functional boundaries and performance expectations.

1. Functional Requirements

Scalable Grid Size: Support any arbitrary $N \times N$ board size (e.g., $3 \times 3$, $10 \times 10$, or $1000 \times 1000$).
Flexible Player Count: Support $M$ players (default is $2$, but extensible to multiplayer variants) taking turns to place their unique markers.
Dynamic Win-Condition Verification: Instantly announce a winner when any player places a marker that occupies a full row, column, or diagonal.
Extensible Winning Rules: Allow different winning strategies (e.g., standard line win vs. corner-only win vs. custom shape matches) using standard design patterns.
Turn Orchestration: Enforce strict sequence constraints, preventing players from making out-of-turn moves or overriding already occupied cells.

2. Non-Functional Constraints

Low Latency ($O(1)$ Moves): Placing a move and verifying the win status must execute in constant $O(1)$ time, regardless of the size of the board ($N$).
Thread-Safety: Ensure that in a concurrent environment, no two requests can place markers on the same grid cell simultaneously.
Minimal Memory Footprint: Bounded memory allocations. Avoid allocating massive empty 2D matrices when players only occupy a fraction of the board.
Loose Coupling: The core game engine must be decoupled from the rendering layer (CLI, HTTP REST, or WebSocket streaming).

3. Sizing and Capacity Math

Let's analyze the memory consumption of a massive, scalable board size where $N = 1,000$:

Naive 2D Array Board: Storing characters in a standard primitive 2D matrix: $$\text{Memory Overhead} = 1,000 \times 1,000 \times 2 \text{ Bytes (char)} = 2 \text{ MB per board instance}$$ While 2MB seems tiny, if a backend server handles $100,000$ active concurrent matches, the raw board states alone will consume: $$\text{Total Memory} = 100,000 \times 2 \text{ MB} = 200 \text{ GB of RAM}$$ This massive memory overhead will choke JVM heaps, causing garbage collection pauses and OOM failures.
Sparse HashMap Board: By storing only occupied cells inside a hash map Map<String, Cell>: If a typical game terminates after an average of $200$ moves: $$\text{Occupied Cells} = 200$$ $$\text{RAM Cost/Cell Object} = 32 \text{ Bytes}$$ $$\text{Total Board Memory} = 200 \text{ moves} \times 32 \text{ Bytes} \approx 6.4 \text{ KB}$$ $$\text{Total Memory for 100K active matches} = 100,000 \times 6.4 \text{ KB} \approx 640 \text{ MB of RAM}$$ By adopting a sparse layout, we reduce heap memory consumption by 99.68%, demonstrating the power of smart Low-Level Design under enterprise scale.

API Design and Interface Contracts

The Tic Tac Toe LLD engine is controlled via structured method contracts and REST APIs. Below is the API payload definition for game execution.

1. Create a Game Session

POST /v1/games

Request Payload:

{
  "board_size": 3,
  "players": [
    { "player_id": "usr-111", "name": "Alice", "symbol": "X" },
    { "player_id": "usr-222", "name": "Bob", "symbol": "O" }
  ],
  "winning_strategy": "STANDARD"
}

Response Payload (201 Created):

{
  "game_id": "game-9988-xyz",
  "status": "IN_PROGRESS",
  "current_turn_player_id": "usr-111",
  "board_view_uri": "/v1/games/game-9988-xyz/board"
}

2. Execute a Player Move

POST /v1/games/game-9988-xyz/moves

Request Payload:

{
  "player_id": "usr-111",
  "row": 1,
  "col": 2
}

Response Payload (200 OK):

{
  "move_id": "mv-54321",
  "game_status": "IN_PROGRESS",
  "next_player_id": "usr-222",
  "has_winner": false,
  "winner_id": null
}

High-Level Design Architecture

To ensure strict separation of concerns, our LLD isolates turn processing, board state, winning strategies, and game representation.

1. Object-Oriented Component Mesh

The class relations follow solid OOP principles. The Game class acts as the central coordinator, delegating grid state tracking to Board, user representation to Player, and winning verification to WinningStrategy (implementing the Strategy Design Pattern).

classDiagram
    class Game {
        -String gameId
        -Board board
        -List~Player~ players
        -int currentTurnIndex
        -GameStatus status
        -WinningStrategy winStrategy
        +makeMove(Player p, int row, int col) MoveResult
    }
    class Board {
        -int size
        -Map~String, Cell~ grid
        +isCellEmpty(int r, int c) bool
        +placeMarker(int r, int c, Symbol s)
    }
    class Player {
        -String id
        -String name
        -Symbol symbol
    }
    class WinningStrategy {
        <<interface>>
        +checkWin(Board b, Move m) bool
    }
    class StandardWinStrategy {
        -int[] rows
        -int[] cols
        -int diagonal
        -int antiDiagonal
        +checkWin(Board b, Move m) bool
    }
    
    Game *-- Board
    Game *-- Player
    Game *-- WinningStrategy
    WinningStrategy <|.. StandardWinStrategy

2. Move Validation & Win-Check Flow

The sequence diagram below shows how a move request propagates through the model, updating running counters and checking the win state in constant $O(1)$ time.

sequenceDiagram
    autonumber
    actor Client as Player Client
    participant Controller as GameController
    participant Engine as GameEngine
    participant Board as BoardState
    participant Strategy as StandardWinStrategy

    Client->>Controller: POST MakeMove(usr-111, r=1, c=2)
    Controller->>Engine: processMove(playerId, 1, 2)
    Engine->>Engine: Validate Player Turn
    Engine->>Board: isCellEmpty(1, 2)
    Board-->>Engine: true
    Engine->>Board: placeMarker(1, 2, Symbol_X)
    Engine->>Strategy: checkWin(Board, Move_1_2)
    Note over Strategy: Update rows[1], cols[2], diag, anti-diag
    Strategy-->>Engine: false (No win yet)
    Engine-->>Controller: MoveResult(IN_PROGRESS)
    Controller-->>Client: 200 OK (Next Turn: usr-222)

Low-Level Design & Component Mechanics

To build a robust Machine Coding solution, we translate our architecture into a production-grade, thread-safe, and compilable Java model.

1. Java Low-Level Class Structure

package com.codesprintpro.tictactoe;

import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.atomic.AtomicInteger;

// 1. Enum representing the current state of the match
enum GameStatus {
    NOT_STARTED,
    IN_PROGRESS,
    WON,
    DRAW
}

// 2. Class representing Player attributes
class Player {
    private final String id;
    private final String name;
    private final char symbol;

    public Player(String id, String name, char symbol) {
        this.id = id;
        this.name = name;
        this.symbol = symbol;
    }

    public String getId() { return id; }
    public String getName() { return name; }
    public char getSymbol() { return symbol; }
}

// 3. Thread-Safe Sparse Board representation
class Board {
    private final int size;
    private final Map<String, Character> grid = new ConcurrentHashMap<>();

    public Board(int size) {
        this.size = size;
    }

    public int getSize() { return size; }

    public boolean isValidMove(int r, int c) {
        return r >= 0 && r < size && c >= 0 && c < size && !grid.containsKey(r + "," + c);
    }

    public void placeMarker(int r, int c, char symbol) {
        grid.put(r + "," + c, symbol);
    }

    public boolean isFull() {
        return grid.size() == size * size;
    }
}

// 4. Strategy Pattern interface for win verification
interface WinningStrategy {
    boolean checkWin(Board board, int row, int col, int playerVal);
}

// 5. O(1) Running Totals standard win implementation
class StandardWinningStrategy implements WinningStrategy {
    private final int[] rows;
    private final int[] cols;
    private int diagonal = 0;
    private int antiDiagonal = 0;
    private final int n;

    public StandardWinningStrategy(int size) {
        this.n = size;
        this.rows = new int[size];
        this.cols = new int[size];
    }

    @Override
    public synchronized boolean checkWin(Board board, int row, int col, int playerVal) {
        rows[row] += playerVal;
        cols[col] += playerVal;

        if (row == col) {
            diagonal += playerVal;
        }
        if (col == (n - row - 1)) {
            antiDiagonal += playerVal;
        }

        // Return true if any total matches the target board dimension
        return Math.abs(rows[row]) == n || 
               Math.abs(cols[col]) == n || 
               Math.abs(diagonal) == n || 
               Math.abs(antiDiagonal) == n;
    }
}

// 6. Primary Game Coordinator
class Game {
    private final String gameId;
    private final Board board;
    private final List<Player> players;
    private final WinningStrategy winningStrategy;
    private final AtomicInteger currentTurnIndex = new AtomicInteger(0);
    private GameStatus status = GameStatus.NOT_STARTED;
    private Player winner = null;

    public Game(String gameId, int boardSize, List<Player> players) {
        this.gameId = gameId;
        this.board = new Board(boardSize);
        this.players = players;
        this.winningStrategy = new StandardWinningStrategy(boardSize);
        this.status = GameStatus.IN_PROGRESS;
    }

    // Thread-safe method to execute a turn
    public synchronized String makeMove(String playerId, int row, int col) {
        if (status != GameStatus.IN_PROGRESS) {
            throw new IllegalStateException("Game is not in progress.");
        }

        Player activePlayer = players.get(currentTurnIndex.get());
        if (!activePlayer.getId().equals(playerId)) {
            return "OUT_OF_TURN_MOVE";
        }

        if (!board.isValidMove(row, col)) {
            return "INVALID_CELL_OR_OCCUPIED";
        }

        // Place marker on board
        board.placeMarker(row, col, activePlayer.getSymbol());

        // Player 1 contributes +1, Player 2 contributes -1
        int playerValue = (currentTurnIndex.get() == 0) ? 1 : -1;

        // Verify win status in O(1)
        if (winningStrategy.checkWin(board, row, col, playerValue)) {
            this.status = GameStatus.WON;
            this.winner = activePlayer;
            return "PLAYER_WON";
        }

        if (board.isFull()) {
            this.status = GameStatus.DRAW;
            return "GAME_DRAWN";
        }

        // Transition to next player turn
        currentTurnIndex.set((currentTurnIndex.get() + 1) % players.size());
        return "MOVE_ACCEPTED";
    }

    public GameStatus getStatus() { return status; }
    public Player getWinner() { return winner; }
}

2. Win Check Optimization Mechanics

The naive check scans the entire row, column, and diagonals every time a player makes a move, resulting in a time complexity of $O(N)$ per turn. On large grids (e.g. $10,000 \times 10,000$), this approach leads to severe processing stalls. By assigning Player 1 a mathematical value of +1 and Player 2 a value of -1, we maintain running sums for rows, columns, and diagonals.

Whenever a move is played, we perform simple arithmetic additions: $rows[row] += value$.
Checking if someone has won is simplified to testing: $Math.abs(rows[row]) == N$.
This calculation takes absolute $O(1)$ constant time and uses $O(N)$ space, ensuring lightning-fast execution on massive grids.

Scaling Challenges & Production Bottlenecks

Transitioning this local model to a massive distributed gaming service exposes complex physical bottlenecks:

1. Concurrent Cell Collisions (Race Conditions)

In high-velocity online lobbies, two players might attempt to tap the exact same empty board cell at the exact same millisecond. If we rely on stateless backend nodes, both moves could be validated concurrently, overriding a marker and creating game desynchronization.

Mitigation (Distributed Optimistic Concurrency Control):

Distributed Locks: Before allowing a move to execute, the API gateway attempts to acquire a distributed lock in Redis keyed by the board cell coordinate:
```
SET game:9988:cell:1,2 lock_val NX PX 1000
```
If the cell is already locked, the request is immediately rejected with an INVALID_CELL_OR_OCCUPIED error.
Database Versioning: The game state version is incremented on every valid move. If a concurrent update with an outdated version is received, it is immediately discarded.

2. Large Sparse Grid Memory Consumption

On vast boards ($N = 100,000$), allocating primitive arrays for running totals ($rows$ and $cols$) is memory-inefficient when most rows remain completely empty.

Mitigation (Hash-Map Sparse Counters): Instead of pre-allocating large integer arrays, we implement sparse running total maps: Map<Integer, Integer> rowSums = new ConcurrentHashMap<>(); We only write keys to this map when a player places a marker on that row, capping the space complexity to $O(\text{Moves})$ rather than $O(N)$, ensuring optimal memory utilization.

Technical Trade-offs & Strategic Compromises

LLD requires making deliberate compromises between architectural patterns and raw computational efficiency.

Implementation Option	Time Complexity	Space Complexity	Extensibility	Thread-Safety Complexity
Naive Matrix Scan	$O(N)$	$O(N^2)$ (High Memory)	Low	High
Local Vector Scan	$O(N)$	$O(N^2)$	Low	High
Running Totals (Arrays)	$O(1)$ (Optimal)	$O(N)$ (Medium)	High	Medium
Sparse Hash Counters	$O(1)$	$O(\text{Moves})$ (Optimal)	High	Low

Running Totals Array vs. Sparse Hash Counters

Using primitive arrays for running totals gives the fastest possible execution since array lookups are handled directly by CPU cache lines. However, this pre-allocates $O(N)$ memory upfront. Using Sparse Hash Maps saves massive amounts of memory on very large boards, but introduces minor hashing collision overhead and garbage collection pressure due to boxing integers. For typical online matches (where $N < 100$), primitive arrays are superior due to zero allocation churn, while sparse maps are chosen for massive, custom custom sandbox grids.

Failure Scenarios and Fault Tolerance

Stateful multiplayer systems must ensure continuous recovery across network and host failures.

1. Transient Client Disconnection

If a player's internet drops out mid-match, WebSockets will disconnect, losing the local UI state.

Fault Tolerance Strategy:

The core game state is persisted in a fast, in-memory Redis Hash.
When the client reconnects, it initiates a handshake including the last_seen_move_id sequence number.
The server replays the missed moves from the Redis log, restoring the client UI to the exact active frame.

2. Stateful Backend Pod Failover

If a backend game server hosting active matches crashes, the in-memory board state is lost, frustrating users.

Fault Tolerance Strategy:

We operate completely stateless application nodes.
Every move request is transactionally written to Redis before broadcasting events.
If a backend pod crashes, any other active node in the Kubernetes cluster can pick up subsequent moves, retrieve the game state from Redis, and process the turn with zero disruption.

Staff Engineer Perspective

Verbal Script & Mock Interview

Mock Interview Dialogue

Interviewer: "Welcome! Today, we want to design the low-level architecture for a scalable Tic Tac Toe game system. The game should be able to run on an $N \times N$ board size and support concurrent matching. How would you model this using clean object-oriented principles?"

Candidate: *"To design a production-grade Tic Tac Toe game, I would focus on three pillars: strict adherence to SOLID design principles, thread-safe turn execution, and optimizing the win-condition verification to constant $O(1)$ time complexity.

I will model the system using several core entities:

Player: Encapsulates user identification and their designated symbol ('X' or 'O').
Board: Manages the board state. To optimize memory, instead of pre-allocating a massive 2D array which scales at $O(N^2)$, I will implement a sparse board representation using a thread-safe ConcurrentHashMap of coordinate strings to character markers. This caps our board state memory to $O(\text{Moves})$ rather than $O(N^2)$.
WinningStrategy: An interface defining win-condition logic. By decoupling this using the Strategy Design Pattern, we can easily plug in standard line wins, corner-only wins, or custom Gomoku rules without modifying our core game orchestrator.
Game: The central controller that manages player lists, state transitions, turn validation, and notifies clients of outcomes."*

Interviewer: "That is a very solid modular structure. Now, tell me about the $O(1)$ win-condition check. How do you implement that?"

Candidate: *"The naive approach is to scan the row, column, and diagonals of the placed move. This takes $O(N)$ time per turn, which stalls under large board dimensions.

To achieve $O(1)$ constant time execution, I introduce a mathematical optimization. We assign the first player a value of +1 and the second player a value of -1. Inside our StandardWinningStrategy class, we maintain two integer arrays of size $N$—rows and cols—and two integers for the diagonal and antiDiagonal.

When a player places a marker, we simply increment the running sums at the respective row, column, and diagonal indices by the player's value. We then evaluate if the absolute value of the updated row, column, or diagonal count equals $N$.

Because this check relies on simple array indexing and additions, it executes in absolute constant $O(1)$ time, completely independent of the board size $N$."*

Interviewer: "Excellent. How do you handle concurrency? What if two players try to make a move on the same cell at the exact same millisecond?"

Candidate: *"In a highly concurrent multi-threaded environment, we must prevent double-booking cells.

First, within the JVM, I make the makeMove execution block synchronized or utilize a fine-grained reentrant lock. The isValidMove check verifies that the cell is empty before writing to the map.

However, to scale this to a distributed backend mesh running across multiple nodes, JVM-level synchronization is not enough. I introduce a distributed lock in Redis using the cell coordinates as the key. Before executing a move, the backend thread attempts to acquire the lock. If another thread got there first, the lock acquisition fails, and the move is rejected safely without creating data corruption.

Additionally, the game state version is incremented on every valid move, ensuring that if a delayed out-of-order move reaches the server, it is discarded by optimistic concurrency control."*

Interviewer: "Outstanding. Your understanding of design patterns, algorithmic complexity, and concurrent scaling is exceptionally strong."