Java Heap Dump Analysis: Finding the Silent Killer
An OutOfMemoryError (OOME) is the nightmare of every backend engineer. But the real problem isn't the error itself — it's the invisible memory leak that has been growing for days. To fix it, you need to master Heap Dump Analysis.
1. What is a Heap Dump?
A heap dump is a snapshot of all the objects in the Java Virtual Machine (JVM) heap at a specific moment. It contains information about the class, fields, and references for every object.
2. How to Capture a Heap Dump
In production, you rarely want to capture a dump manually. You want the JVM to do it automatically when it crashes.
Automatic Capture
Add these flags to your JVM startup script:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/dumps/oom.hprof
Manual Capture (jmap)
If you notice memory usage is rising but haven't hit OOME yet:
jmap -dump:live,format=b,file=heap_dump.hprof <pid>
3. The Analysis Tools
Don't try to read a .hprof file in a text editor. You need specialized tools:
- Eclipse MAT (Memory Analyzer): The industry standard. Its "Leak Suspects" report is incredibly accurate.
- VisualVM: Great for real-time monitoring and quick snapshots.
- JProfiler / YourKit: Premium tools with deep integration and advanced features.
4. The Step-by-Step Analysis Workflow
Step 1: Look at the Histogram
Start by looking at which classes are consuming the most memory. Is it byte[], String, or a custom class like OrderProcessingTask?
Step 2: Identify the GC Roots
An object stays in memory as long as it's reachable from a GC Root (e.g., a thread stack, a static variable, or a JNI reference). Use MAT to "Path to GC Roots" to see why an object isn't being collected.
Step 3: Check for "Fat" Objects
Look for a single object that is holding references to millions of smaller objects. This is often a HashMap or a List that is never cleared.
5. Common Memory Leak Culprits
- Static Collections: A
static Listthat only ever grows. - ThreadLocals: Forgetting to call
.remove()on aThreadLocal, especially in a pooled thread environment. - Unclosed Resources: Database connections or file handles that hold onto memory until they are closed.
- Caching without TTL: Using a simple
HashMapas a cache instead of a proper tool like Caffeine or Guava with eviction policies.
Summary
Heap dump analysis is a diagnostic superpower. By automating the capture and using tools like Eclipse MAT to trace GC roots, you can move from "guessing" to "fixing" within minutes.
Next: Java Flight Recorder (JFR): Profiling with Zero Overhead Previous: Modern Java GC: G1 vs. ZGC
