Deep Dive: Trie Implementation
When you type a query into Google, how does it instantly suggest the rest of your sentence? When you type a misspelled word, how does your phone know it's wrong in milliseconds? The answer is often a Trie (pronounced "try").
A Trie, also known as a Prefix Tree, is a specialized tree-like data structure used to efficiently store and retrieve keys in a dataset of strings. Unlike binary search trees, nodes in a Trie do not store their associated key. Instead, a node's position in the tree defines the key with which it is associated.
In top tech interviews, Tries are the secret weapon for solving complex string problems that would otherwise timeout using standard HashMaps or Arrays.
When Should You Think About a Trie?
Consider using a Trie when:
- You are dealing with a dictionary of words or a large set of strings.
- The problem involves prefix matching (e.g., "find all words starting with 'auto'").
- You need to implement autocomplete or a spell checker.
- You are solving problems related to word search on a grid (like Boggle).
- You need to find the longest common prefix among a set of strings.
Core Concept of a Trie
Imagine storing the words "cat", "car", "cap", and "dog".
In a Trie, the root node represents an empty string. The children of the root represent the first letters of all stored words ('c' and 'd'). The path from the root down to a specific node represents a prefix or a complete word.
Key properties:
- Nodes: Each node typically contains an array of pointers (or a HashMap) to its children, representing the alphabet (e.g., an array of size 26 for lowercase English letters).
- End of Word Marker: Because "car" is a word, the node representing 'r' must have a boolean flag
isEndOfWord = true. This distinguishes a complete word from a mere prefix (like "ca").
Implementing a Trie in Java
Let's build a standard Trie that supports inserting a word, searching for a complete word, and checking if a prefix exists.
Step 1: The TrieNode Class
class TrieNode {
// Array to hold children. Size 26 for 'a' to 'z'
TrieNode[] children;
// Flag to mark the end of a valid word
boolean isEndOfWord;
public TrieNode() {
children = new TrieNode[26];
isEndOfWord = false;
}
}
Step 2: The Trie Class
public class Trie {
private TrieNode root;
public Trie() {
root = new TrieNode();
}
// Inserts a word into the trie.
public void insert(String word) {
TrieNode current = root;
for (int i = 0; i < word.length(); i++) {
char ch = word.charAt(i);
int index = ch - 'a'; // Calculate index (0 to 25)
// If the character doesn't exist, create a new node
if (current.children[index] == null) {
current.children[index] = new TrieNode();
}
// Move down the tree
current = current.children[index];
}
// Mark the final node as the end of a word
current.isEndOfWord = true;
}
// Returns true if the word is in the trie.
public boolean search(String word) {
TrieNode current = root;
for (int i = 0; i < word.length(); i++) {
char ch = word.charAt(i);
int index = ch - 'a';
// If the path breaks, the word doesn't exist
if (current.children[index] == null) {
return false;
}
current = current.children[index];
}
// It's only a valid word if the final node is marked
return current.isEndOfWord;
}
// Returns true if there is any word in the trie that starts with the given prefix.
public boolean startsWith(String prefix) {
TrieNode current = root;
for (int i = 0; i < prefix.length(); i++) {
char ch = prefix.charAt(i);
int index = ch - 'a';
if (current.children[index] == null) {
return false;
}
current = current.children[index];
}
// If we successfully navigated the prefix, it exists
return true;
}
}
Complexity Analysis
- Time Complexity:
- Insert: $O(L)$, where $L$ is the length of the word. We do exactly $L$ operations.
- Search/StartsWith: $O(L)$, where $L$ is the length of the word/prefix. This is blazingly fast and independent of how many words are stored in the Trie!
- Space Complexity:
- $O(N \times L \times 26)$ in the absolute worst case (where $N$ is the number of words, and there are no shared prefixes). However, Tries are highly space-efficient when words share common prefixes (which is typical for dictionaries).
Dry Run: Building a Trie
Let's insert "app" and "apple".
- Initialize: Create empty
root. - Insert "app":
- 'a':
root.children[0]is null. Create new node. Move to it. - 'p':
current.children[15]is null. Create new node. Move to it. - 'p':
current.children[15]is null. Create new node. Move to it. - End of string. Set
current.isEndOfWord = true.
- 'a':
- Insert "apple":
- 'a':
root.children[0]exists! Move to it. (Shared prefix) - 'p':
current.children[15]exists! Move to it. (Shared prefix) - 'p':
current.children[15]exists! Move to it. (Shared prefix) - 'l':
current.children[11]is null. Create new node. Move to it. - 'e':
current.children[4]is null. Create new node. Move to it. - End of string. Set
current.isEndOfWord = true.
- 'a':
Notice how "app" and "apple" share the first three nodes. This is the magic of the Prefix Tree.
Trie vs. HashMap
Why use a Trie when a HashMap<String, Boolean> offers $O(1)$ lookups?
- Prefix Searching: A HashMap cannot easily find all words starting with "auto-". You would have to iterate through all keys. A Trie does this in $O(L)$ time.
- Memory Efficiency for Prefixes: If you store 10,000 words starting with "pre", a HashMap stores the string "pre" 10,000 times in memory. A Trie stores the nodes 'p' $\rightarrow$ 'r' $\rightarrow$ 'e' exactly once.
- No Hash Collisions: Tries don't suffer from hash collisions, resulting in consistent worst-case performance.
Advanced Pattern: Trie + Backtracking (Word Search II)
The true power of Tries in interviews is combining them with Backtracking (DFS).
In problems like Word Search II (given a 2D board of characters and a list of words, find all words on the board), checking every path against a HashMap is too slow.
The Trie Optimization:
- Insert all the target words into a Trie.
- Perform a DFS from every cell on the board.
- Instead of passing a string down the DFS, pass the
TrieNode. - If at any point
node.children[board[row][col] - 'a'] == null, you can immediately prune the DFS branch! This massive optimization is what gets you the "Hire" recommendation.
Common Mistakes
Mistake 1: Forgetting isEndOfWord
If you search for "cap", but you only inserted "capital", the traversal will successfully reach 'p'. If you don't check current.isEndOfWord, you will incorrectly return true.
Mistake 2: Hardcoding array size to 26
While a size 26 array works for lowercase English letters, some problems include uppercase, digits, or ASCII characters. In those cases, use TrieNode[] children = new TrieNode[128] (or 256), or use a HashMap<Character, TrieNode> children.
Example 2: Design Add and Search Words Data Structure
Support adding new words and searching for a string that may contain dots . where a dot can match any letter.
public boolean search(String word) {
return searchInNode(word, 0, root);
}
private boolean searchInNode(String word, int index, TrieNode node) {
if (index == word.length()) return node.isEndOfWord;
char ch = word.charAt(index);
if (ch != '.') {
TrieNode child = node.children[ch - 'a'];
return child != null && searchInNode(word, index + 1, child);
} else {
for (TrieNode child : node.children) {
if (child != null && searchInNode(word, index + 1, child)) return true;
}
}
return false;
}
Example 3: Longest Common Prefix
public String longestCommonPrefix(String[] strs) {
if (strs == null || strs.length == 0) return "";
Trie trie = new Trie();
for (String s : strs) trie.insert(s);
// Traverse trie until a node has >1 child or isEndOfWord
}
Practice Problems
- Implement Trie (Prefix Tree): (Medium)
- Design Add and Search Words Data Structure: (Medium)
- Word Search II: (Hard)
- Replace Words: (Medium)
- Maximum XOR of Two Numbers in an Array: (Medium)
- Longest Common Prefix: (Easy)
- Search Suggestions System: (Medium)
- Map Sum Pairs: (Medium)
- Stream of Characters: (Hard)
- Palindrome Pairs: (Hard)
Interview Script You Can Reuse
"Since we need to perform efficient prefix matching and lookups against a large dictionary of words, a Trie (Prefix Tree) is the ideal data structure. I'll build a Trie where each node contains an array of child pointers and a boolean flag to mark the end of valid words. This allows us to insert and search for words in O(L) time, where L is the length of the word, completely independent of the total number of words in our dictionary. It also saves space by sharing common prefixes."
Final Takeaways
- Tries are trees specialized for strings and prefixes.
- They excel at autocomplete, spell checking, and dictionary lookups.
- Insertion and Search are strictly $O(L)$ time.
- Combining a Trie with DFS/Backtracking is a master-tier optimization for matrix search problems.