1. Problem Statement
Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#').
You are given a string array sentences and an integer array times representing the historical frequencies.
Implement the AutocompleteSystem class:
AutocompleteSystem(String[] sentences, int[] times)Initializes the object with the historical data.List<String> input(char c)The user types a characterc. Ifc == '#', the user has finished typing. Otherwise, return the top 3 historical hot sentences that have the same prefix as the part of the sentence already typed.- Return the top 3 sentences sorted by frequency (descending), breaking ties by ASCII-code (ascending).
2. Approach: Trie + Precomputed Top 3
If we just use a standard Trie, finding the top 3 sentences requires a full DFS of the subtree at every keystroke, taking $O(V)$ time where $V$ is all nodes in the subtree. This is too slow for real-time autocomplete.
The "Aha!" Moment
Instead of calculating the top 3 at query time, pre-calculate and store them in the TrieNode itself. Every node in the Trie will hold a small list of the top 3 sentences that pass through it.
- State:
Trie: Stores characters. Each node has aList<String> top3.Map<String, Integer> counts: Stores the global frequency of all sentences.StringBuilder currentPrefix: Tracks what the user is currently typing.TrieNode currNode: Tracks the current position in the Trie to avoid searching from the root on every keystroke.
- Input('#'): Save the
currentPrefixto the Map, increment its count, and re-insert it into the Trie to update thetop3lists along its path. ResetcurrentPrefixandcurrNode.
3. Java Implementation
class AutocompleteSystem {
class TrieNode {
Map<Character, TrieNode> children = new HashMap<>();
List<String> top3 = new ArrayList<>();
}
private TrieNode root;
private TrieNode currNode;
private StringBuilder currSentence;
private Map<String, Integer> counts;
public AutocompleteSystem(String[] sentences, int[] times) {
root = new TrieNode();
counts = new HashMap<>();
for (int i = 0; i < sentences.length; i++) {
counts.put(sentences[i], times[i]);
insert(sentences[i]);
}
currNode = root;
currSentence = new StringBuilder();
}
private void insert(String sentence) {
TrieNode node = root;
for (char c : sentence.toCharArray()) {
node.children.putIfAbsent(c, new TrieNode());
node = node.children.get(c);
updateTop3(node, sentence);
}
}
private void updateTop3(TrieNode node, String sentence) {
if (!node.top3.contains(sentence)) {
node.top3.add(sentence);
}
node.top3.sort((a, b) -> {
int countA = counts.get(a);
int countB = counts.get(b);
if (countA == countB) return a.compareTo(b);
return countB - countA;
});
if (node.top3.size() > 3) {
node.top3.remove(3);
}
}
public List<String> input(char c) {
if (c == '#') {
String sentence = currSentence.toString();
counts.put(sentence, counts.getOrDefault(sentence, 0) + 1);
insert(sentence);
currSentence = new StringBuilder();
currNode = root;
return new ArrayList<>();
}
currSentence.append(c);
if (currNode != null) {
currNode = currNode.children.get(c);
}
return currNode == null ? new ArrayList<>() : currNode.top3;
}
}
4. 5-Minute "Video-Style" Walkthrough
- The Read/Write Tradeoff: In Autocomplete, reads (typing a character) happen 100x more often than writes (submitting a sentence). By doing the heavy lifting (sorting and maintaining the top 3) during
insert(), ourinput()method becomes $O(1)$! - The Pointer Trick: Notice the
currNodevariable. When a user types "a", we go to the 'a' node. When they type "p", we don't start from the root to find "ap". We just look atcurrNode.children.get('p'). This stateful tracking is crucial for latency.
5. Interview Discussion
- Interviewer: "What happens if a sentence gets very popular and its rank changes?"
- You: "Our
updateTop3logic handles this because when#is typed, we re-runinsert(), which re-sorts thetop3list at every node along the path." - Interviewer: "How would you scale this for Google?"
- You: "We would shard the Trie based on the first character (or prefix) and use a distributed cache (Redis). The Trie would be built offline via MapReduce and pushed to read-replicas."