Lesson 28 of 47 4 min

MANG Problem #39: Design Search Autocomplete System (Hard)

Learn how to design an efficient search autocomplete system using a Trie combined with caching of top historical results.

1. Problem Statement

Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#').

You are given a string array sentences and an integer array times representing the historical frequencies.

Implement the AutocompleteSystem class:

  • AutocompleteSystem(String[] sentences, int[] times) Initializes the object with the historical data.
  • List<String> input(char c) The user types a character c. If c == '#', the user has finished typing. Otherwise, return the top 3 historical hot sentences that have the same prefix as the part of the sentence already typed.
  • Return the top 3 sentences sorted by frequency (descending), breaking ties by ASCII-code (ascending).

2. Approach: Trie + Precomputed Top 3

If we just use a standard Trie, finding the top 3 sentences requires a full DFS of the subtree at every keystroke, taking $O(V)$ time where $V$ is all nodes in the subtree. This is too slow for real-time autocomplete.

The "Aha!" Moment

Instead of calculating the top 3 at query time, pre-calculate and store them in the TrieNode itself. Every node in the Trie will hold a small list of the top 3 sentences that pass through it.

  1. State:
    • Trie: Stores characters. Each node has a List<String> top3.
    • Map<String, Integer> counts: Stores the global frequency of all sentences.
    • StringBuilder currentPrefix: Tracks what the user is currently typing.
    • TrieNode currNode: Tracks the current position in the Trie to avoid searching from the root on every keystroke.
  2. Input('#'): Save the currentPrefix to the Map, increment its count, and re-insert it into the Trie to update the top3 lists along its path. Reset currentPrefix and currNode.

3. Java Implementation

class AutocompleteSystem {
    class TrieNode {
        Map<Character, TrieNode> children = new HashMap<>();
        List<String> top3 = new ArrayList<>();
    }

    private TrieNode root;
    private TrieNode currNode;
    private StringBuilder currSentence;
    private Map<String, Integer> counts;

    public AutocompleteSystem(String[] sentences, int[] times) {
        root = new TrieNode();
        counts = new HashMap<>();
        for (int i = 0; i < sentences.length; i++) {
            counts.put(sentences[i], times[i]);
            insert(sentences[i]);
        }
        currNode = root;
        currSentence = new StringBuilder();
    }

    private void insert(String sentence) {
        TrieNode node = root;
        for (char c : sentence.toCharArray()) {
            node.children.putIfAbsent(c, new TrieNode());
            node = node.children.get(c);
            updateTop3(node, sentence);
        }
    }

    private void updateTop3(TrieNode node, String sentence) {
        if (!node.top3.contains(sentence)) {
            node.top3.add(sentence);
        }
        node.top3.sort((a, b) -> {
            int countA = counts.get(a);
            int countB = counts.get(b);
            if (countA == countB) return a.compareTo(b);
            return countB - countA;
        });
        if (node.top3.size() > 3) {
            node.top3.remove(3);
        }
    }

    public List<String> input(char c) {
        if (c == '#') {
            String sentence = currSentence.toString();
            counts.put(sentence, counts.getOrDefault(sentence, 0) + 1);
            insert(sentence);
            currSentence = new StringBuilder();
            currNode = root;
            return new ArrayList<>();
        }

        currSentence.append(c);
        if (currNode != null) {
            currNode = currNode.children.get(c);
        }
        
        return currNode == null ? new ArrayList<>() : currNode.top3;
    }
}

4. 5-Minute "Video-Style" Walkthrough

  1. The Read/Write Tradeoff: In Autocomplete, reads (typing a character) happen 100x more often than writes (submitting a sentence). By doing the heavy lifting (sorting and maintaining the top 3) during insert(), our input() method becomes $O(1)$!
  2. The Pointer Trick: Notice the currNode variable. When a user types "a", we go to the 'a' node. When they type "p", we don't start from the root to find "ap". We just look at currNode.children.get('p'). This stateful tracking is crucial for latency.

5. Interview Discussion

  • Interviewer: "What happens if a sentence gets very popular and its rank changes?"
  • You: "Our updateTop3 logic handles this because when # is typed, we re-run insert(), which re-sorts the top3 list at every node along the path."
  • Interviewer: "How would you scale this for Google?"
  • You: "We would shard the Trie based on the first character (or prefix) and use a distributed cache (Redis). The Trie would be built offline via MapReduce and pushed to read-replicas."

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.