DSA

Suffix Automaton in Java: The Ultimate String Data Structure

Master Suffix Automaton (SAM) in Java. Learn how to build this powerful structure in linear time to solve complex string problems like substring counting and pattern matching.

Sachin Sarawgi·April 21, 2026·3 min read
#dsa#java#string#suffix automaton#sam#interview preparation#algorithms

When it comes to string processing, the Suffix Automaton (SAM) is often considered the most powerful data structure. It is a minimal Deterministic Finite Automaton (DFA) that recognizes all suffixes of a given string.

While Suffix Trees and Suffix Arrays are more common, SAM can be built in linear time and space and is often easier to implement for complex substring queries.

The Core Concept: End-Equivalent Classes

The SAM groups substrings into "end-equivalent" classes. Two substrings are equivalent if they appear at the same set of end positions in the original string.

  1. States: Each state in the SAM represents a set of substrings that share the same end positions.
  2. Transitions: Edges between states represent adding a character to the substrings.
  3. Suffix Links: Links that point to the longest suffix of a state's substrings that belongs to a different end-equivalent class.

Suffix Automaton Implementation in Java

import java.util.*;

public class SuffixAutomaton {
    static class State {
        int len, link;
        Map<Character, Integer> next = new HashMap<>();

        State(int len, int link) {
            this.len = len;
            this.link = link;
        }
    }

    private List<State> st = new ArrayList<>();
    private int last;

    public SuffixAutomaton(String s) {
        st.add(new State(0, -1));
        last = 0;
        for (char c : s.toCharArray()) {
            extend(c);
        }
    }

    private void extend(char c) {
        int cur = st.size();
        st.add(new State(st.get(last).len + 1, 0));
        int p = last;
        while (p != -1 && !st.get(p).next.containsKey(c)) {
            st.get(p).next.put(c, cur);
            p = st.get(p).link;
        }
        if (p == -1) {
            st.get(cur).link = 0;
        } else {
            int q = st.get(p).next.get(c);
            if (st.get(p).len + 1 == st.get(q).len) {
                st.get(cur).link = q;
            } else {
                int clone = st.size();
                State qState = st.get(q);
                State cloneState = new State(st.get(p).len + 1, qState.link);
                cloneState.next.putAll(qState.next);
                st.add(cloneState);
                while (p != -1 && st.get(p).next.get(c) == q) {
                    st.get(p).next.put(c, clone);
                    p = st.get(p).link;
                }
                st.get(q).link = st.get(cur).link = clone;
            }
        }
        last = cur;
    }

    public boolean contains(String pattern) {
        int cur = 0;
        for (char c : pattern.toCharArray()) {
            if (!st.get(cur).next.containsKey(c)) return false;
            cur = st.get(cur).next.get(c);
        }
        return true;
    }
}

Why use Suffix Automaton?

Feature Suffix Automaton Suffix Tree Suffix Array
Construction $O(N)$ $O(N)$ $O(N \log N)$ or $O(N)$
Space $O(N \times \Sigma)$ $O(N \times \Sigma)$ $O(N)$
Substring Search $O(M)$ $O(M)$ $O(M \log N)$
Implementation Concise Complex Moderate

Real-World Applications

  1. Bioinformatics: Efficiently finding long common substrings in genomic sequences.
  2. Data Compression: Used in algorithms like LZ77 to find the longest previous match.
  3. Plagiarism Detection: Identifying overlapping segments between large documents.

Summary

The Suffix Automaton is a masterclass in efficient string representation. By collapsing all suffixes into a minimal DFA, it provides a robust foundation for solving almost any substring-related problem in linear time. While the construction logic is subtle, its power and versatility make it an essential tool for advanced algorithmic challenges.

📚

Recommended Resources

Java Masterclass — UdemyBest Seller

Comprehensive Java course covering Java 17+, OOP, concurrency, and modern APIs.

View Course
Effective Java, 3rd EditionMust Read

Joshua Bloch's classic guide to writing clear, correct, and efficient Java code.

View on Amazon
Java Concurrency in Practice

The authoritative book on writing thread-safe, concurrent Java programs.

View on Amazon

Practical engineering notes

Get the next backend guide in your inbox

One useful note when a new deep dive is published: system design tradeoffs, Java production lessons, Kafka debugging, database patterns, and AI infrastructure.

No spam. Just practical notes you can use at work.

Sachin Sarawgi

Written by

Sachin Sarawgi

Engineering Manager and backend engineer with 10+ years building distributed systems across fintech, enterprise SaaS, and startups. CodeSprintPro is where I write practical guides on system design, Java, Kafka, databases, AI infrastructure, and production reliability.

Found this useful? Share it: