For the past decade, Java developers dealing with high-concurrency IO-bound workloads faced an uncomfortable choice: write readable sequential code that does not scale, or write reactive/async code that scales but is notoriously difficult to debug and maintain. Project Loom, delivered in Java 21, eliminates this false dichotomy.
Virtual threads let you write blocking sequential code that scales to millions of concurrent operations — no reactive frameworks, no callback chains, no CompletableFuture composition hell.
Why Platform Threads Don't Scale
A traditional Java platform thread maps 1:1 to an OS thread:
- Stack size: 512KB to 1MB per thread
- Practical OS limit: ~10,000 threads before scheduling overhead dominates
- Blocking cost: A platform thread waiting on IO holds an OS thread — doing nothing
At 10,000 concurrent requests each waiting 50ms for a database query, you need 10,000 OS threads. That's 5-10 GB of stack memory. This is why servlet containers default to 200 threads — not because engineers are lazy, but because threads are expensive.
The reactive model addresses this by making IO non-blocking (callbacks, Mono/Flux). The thread releases while IO waits, then a callback runs when IO completes. High throughput, low thread count — but at serious complexity cost. To understand why that complexity matters, consider what a real concurrent fetch looks like once you leave the happy path. The following example demonstrates a parallel user profile build using CompletableFuture — notice how much ceremony you need just to run three calls concurrently and handle a failure:
// CompletableFuture: parallel user profile fetch — readable, but complex
public CompletableFuture<UserProfile> buildProfile(String userId) {
return CompletableFuture
.supplyAsync(() -> userService.fetch(userId))
.thenCombine(
CompletableFuture.supplyAsync(() -> subscriptionService.fetch(userId)),
(user, sub) -> {
try {
List<Product> recs = recommendationService.fetch(user.getPreferences());
return new UserProfile(user, sub, recs);
} catch (Exception e) {
throw new CompletionException(e);
}
}
)
.exceptionally(e -> {
log.error("Failed to build profile", e);
return UserProfile.empty();
});
}
Stack traces in reactive code show only the current stage. Thread-local context (MDC, security) breaks. The mental model is fundamentally different from sequential code.
How Virtual Threads Work
Before writing any code, it helps to understand the architectural difference between platform threads and virtual threads. Think of carrier threads as a small team of workers (one per CPU core), and virtual threads as a vast queue of tasks those workers pick up whenever they are free. When a virtual thread blocks on IO, the worker sets it aside and immediately picks up the next waiting task — nothing is wasted.
Platform Thread Model (1:1 with OS):
VT1 ──► OS Thread 1 (blocked on DB, holding OS thread)
VT2 ──► OS Thread 2 (blocked on HTTP call, holding OS thread)
...
N limited by OS (typical max: 10K)
Virtual Thread Model (M:N with OS):
VT1 ─┐
VT2 ─┤ Scheduled onto ─► Carrier Thread 1 (= 1 OS thread)
VT3 ─┤ ─► Carrier Thread 2 (= 1 OS thread)
VTM ─┘ ─► Carrier Thread N (N = CPU cores)
M can be millions — each VT is ~1KB on heap (not stack)
When a virtual thread hits a blocking operation (JDBC query, HTTP call, Thread.sleep()), the JVM unmounts the virtual thread from the carrier: saves its stack to the heap and lets the carrier take another virtual thread. When the IO completes, the virtual thread is rescheduled — mounted onto an available carrier. The OS never blocks.
This is why virtual threads can handle 100K concurrent connections with only 8 carrier threads (one per CPU core).
Creating Virtual Threads
Java 21 gives you three ways to create virtual threads, each suited to a different context. Choose the one that fits how your application is structured — the newVirtualThreadPerTaskExecutor is the most practical choice for service-layer code because it integrates cleanly with ExecutorService and supports Future-based result collection:
// Method 1: Thread.ofVirtual()
Thread vt = Thread.ofVirtual()
.name("handler-", 0) // Named: handler-0, handler-1, ...
.start(() -> handleRequest(request));
// Method 2: Virtual thread per task executor (most common in services)
try (ExecutorService exec = Executors.newVirtualThreadPerTaskExecutor()) {
Future<UserProfile> profileFuture = exec.submit(() -> buildProfile(userId));
Future<List<Order>> ordersFuture = exec.submit(() -> fetchOrders(userId));
// Both run on virtual threads concurrently
// .get() blocks the calling virtual thread — unmounts it while waiting
UserProfile profile = profileFuture.get();
List<Order> orders = ordersFuture.get();
}
// Method 3: Thread factory (for integrating with existing APIs)
ThreadFactory vtFactory = Thread.ofVirtual().factory();
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(0, vtFactory);
Notice that .get() in Method 2 blocks the calling virtual thread, not the carrier — so even your result-collection code is non-blocking at the OS level.
Structured Concurrency: The Right Way to Fan Out
Virtual threads solve the scalability problem, but running multiple concurrent tasks still requires coordination. Java 21 also introduces StructuredTaskScope — a cleaner model for running concurrent subtasks that makes the relationship between parent and child tasks explicit and ensures no subtask can outlive its enclosing scope:
import java.util.concurrent.StructuredTaskScope;
// Old way: CompletableFuture fan-out
public UserDashboard buildDashboard(String userId) throws InterruptedException {
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
// All three tasks start immediately on virtual threads
var userTask = scope.fork(() -> userService.fetch(userId));
var ordersTask = scope.fork(() -> orderService.fetchRecent(userId, 10));
var notifTask = scope.fork(() -> notificationService.getUnread(userId));
// Wait for all — or cancel all if any fails (ShutdownOnFailure)
scope.join();
scope.throwIfFailed(e -> new DashboardBuildException("Failed to build dashboard", e));
// All subtasks complete here — results are available
return new UserDashboard(
userTask.get(),
ordersTask.get(),
notifTask.get()
);
}
// Scope exit guarantees: no subtask outlives this block
// Any exception = all subtasks cancelled
}
ShutdownOnFailure is one scope policy. ShutdownOnSuccess cancels remaining tasks as soon as one succeeds — useful for "first result wins" patterns. This is a natural fit for scenarios like reading from multiple replicas where you care only about getting the fastest response, not all of them:
// Race multiple read replicas — use whichever responds first
public String readFromFastestReplica(String key) throws Exception {
try (var scope = new StructuredTaskScope.ShutdownOnSuccess<String>()) {
scope.fork(() -> replica1.get(key));
scope.fork(() -> replica2.get(key));
scope.fork(() -> replica3.get(key));
scope.join();
return scope.result(); // First successful result
}
}
The key insight here is that ShutdownOnSuccess automatically cancels the two slower replicas once the first result arrives, preventing wasted work and keeping resource consumption predictable.
Spring Boot 3.2 Configuration
Now that you understand how virtual threads work, enabling them in your Spring Boot application is deliberately straightforward. Virtual threads in Spring Boot 3.2+ require one line:
# application.yml
spring:
threads:
virtual:
enabled: true
This switches Tomcat's thread pool to virtual threads. Each incoming HTTP request gets its own virtual thread. JDBC calls, Redis operations, and HTTP client calls block that virtual thread (not the carrier), freeing the carrier for other requests.
Once you've enabled virtual threads, you can verify they are actually being used at runtime by adding a small diagnostic endpoint. This is especially useful when first rolling out the change to catch any misconfiguration early:
@RestController
public class DiagnosticsController {
@GetMapping("/thread-info")
public Map<String, Object> threadInfo() {
Thread t = Thread.currentThread();
return Map.of(
"name", t.getName(),
"isVirtual", t.isVirtual(),
"isDaemon", t.isDaemon()
);
}
}
// Returns: {"name":"tomcat-handler-7","isVirtual":true,"isDaemon":true}
Benchmark: Virtual Threads vs Platform Threads
The following benchmark results illustrate the real-world impact of virtual threads. The test simulates a realistic IO-bound endpoint — something like any database-backed REST API — where threads spend most of their time waiting rather than computing:
Environment: EC2 c5.2xlarge (8 vCPUs, 16GB RAM), JDK 21.0.2
Concurrency │ Platform Threads (200 pool) │ Virtual Threads
────────────┼─────────────────────────────┼─────────────────
200 │ 3,960 rps, p99: 52ms │ 3,980 rps, p99: 51ms
1,000 │ 980 rps, p99: 1.02s │ 19,600 rps, p99: 52ms
5,000 │ timeout │ 98,000 rps, p99: 53ms
10,000 │ OOM / GC pressure │ 195,000 rps, p99: 55ms
Memory at 10,000 concurrent requests:
Platform Threads: OOM (200 thread pool creates massive queue backlog)
Virtual Threads: ~2.1 GB heap (10K stack frames at ~200KB each)
Below 200 concurrent requests, performance is identical. Above 200, platform threads queue up while virtual threads scale linearly with IO wait time.
Pitfall 1: Pinning
With the performance benefits clear, it's equally important to understand the one failure mode that can silently erase those gains. A virtual thread pins to its carrier when inside a synchronized block. While pinned, the carrier cannot take other virtual threads — blocking the OS thread and recreating the exact problem virtual threads were designed to solve.
// PROBLEM: synchronized + IO = carrier thread blocks
public synchronized String fetchData(String key) {
return database.query("SELECT value FROM cache WHERE key = ?", key);
// Database call blocks inside synchronized → pins carrier thread
}
// SOLUTION: Replace synchronized with ReentrantLock
private final ReentrantLock lock = new ReentrantLock();
public String fetchData(String key) {
lock.lock(); // Virtual thread parks here (unmountable), not carrier
try {
return database.query("SELECT value FROM cache WHERE key = ?", key);
} finally {
lock.unlock();
}
}
// BETTER SOLUTION: Minimize lock scope — lock only for in-memory state
public String fetchData(String key) {
// IO outside the lock
String result = database.query("SELECT value FROM cache WHERE key = ?", key);
// Lock only for the in-memory cache update
synchronized (this) {
localCache.put(key, result);
}
return result;
}
The "better solution" above illustrates a broader principle: separate IO from in-memory coordination. By doing the database call outside the lock, you ensure the virtual thread can unmount during the slow operation, and only hold the lock for the fast, in-memory cache update. Use the JVM's built-in diagnostics to detect any remaining pinning in your application:
# JVM flag: print stack trace when virtual thread pins for >20ms
java -Djdk.tracePinnedThreads=full -jar app.jar
# During load test, watch for:
# VirtualThread[#48]/runnable@ForkJoinPool-1-worker-3
# java.base/.../Unsafe.park (pinned due to monitor hold)
Common libraries with pinning issues (as of 2025):
- HikariCP < 5.1.0: Uses
synchronizedinternally. Workaround: set carrier thread pool size to match HikariCP max pool size - Some JDBC drivers: Oracle, older MySQL connectors
- Legacy code with
synchronizedon IO paths
Pitfall 2: Thread-Local State
The second common pitfall involves how context is passed through the call stack. ThreadLocal works with virtual threads — each virtual thread has its own ThreadLocal map. The subtle problem is that ThreadLocal values set in one request can leak to subsequent requests if not cleaned up.
// LEAK: ThreadLocal not cleared
public void processRequest(Request req) {
MDC.put("requestId", req.getId()); // Sets ThreadLocal
handleRequest(req);
// Missing: MDC.clear()
// If this virtual thread is reused, next request inherits old requestId
}
// SAFE: Always clean up in finally
public void processRequest(Request req) {
try {
MDC.put("requestId", req.getId());
handleRequest(req);
} finally {
MDC.clear();
}
}
// BEST for Java 21+: Use ScopedValue (replaces ThreadLocal for shared context)
static final ScopedValue<RequestContext> REQUEST_CTX = ScopedValue.newInstance();
public void processRequest(Request req) {
ScopedValue.runWhere(REQUEST_CTX, new RequestContext(req), () -> {
handleRequest(req);
}); // ScopedValue automatically cleaned up when runWhere exits
}
private void deepInCallStack() {
// Access from anywhere without passing as parameter
RequestContext ctx = REQUEST_CTX.get();
MDC.put("requestId", ctx.getRequestId());
}
ScopedValue is the long-term solution: it is immutable, automatically scoped to the runWhere block, and eliminates the cleanup burden entirely. Prefer it over ThreadLocal for any new code written on Java 21+.
When NOT to Use Virtual Threads
Virtual threads are a powerful tool, but they are not a universal replacement for every concurrency approach. Knowing the boundaries of their usefulness is as important as knowing how to enable them.
-
CPU-bound work: Virtual threads only help when threads park (wait for IO). CPU-bound tasks never park — they keep the carrier busy. Use
ForkJoinPoolfor CPU-intensive work. -
Very high-frequency, very-short tasks: Nanosecond-duration tasks where virtual thread scheduling overhead dominates. Prefer
CompletableFuturewith a bounded pool. -
Libraries with extensive native pinning: If critical paths run through JNI code that holds monitors, virtual threads cannot help.
For CPU-bound workloads, parallel streams backed by ForkJoinPool remain the right tool — they keep all cores busy with actual computation rather than waiting on IO:
// For CPU-bound parallel work, still use ForkJoinPool or parallel streams:
List<Result> results = items.parallelStream()
.map(this::expensiveComputation) // ForkJoinPool, not virtual threads
.collect(Collectors.toList());
Migration Checklist
With the concepts and pitfalls covered, here is a practical, step-by-step checklist for migrating an existing Spring Boot application to virtual threads. Follow the steps in order — enabling virtual threads first and then diagnosing pinning issues is far more productive than auditing every synchronized block upfront:
# 1. Upgrade
# Java: 21+
# Spring Boot: 3.2+
# Maven/Gradle: latest
# 2. Enable virtual threads (one line)
echo "spring.threads.virtual.enabled=true" >> application.yml
# 3. Detect pinning
java -Djdk.tracePinnedThreads=full -jar app.jar
# Run load test, watch logs for pinning events
# 4. Replace synchronized+IO with ReentrantLock
grep -r "synchronized" src/main/java/ | grep -i "repository\|service\|dao"
# 5. Upgrade HikariCP to 5.1.0+ (reduced synchronized blocks)
# Or set: maximumPoolSize = expected concurrent DB operations (not threads)
# 6. Remove artificial thread pool limits
# Remove from application.properties:
# server.tomcat.threads.max=200
# spring.task.execution.pool.max-size=50
# (Virtual threads don't need these — they scale automatically)
# 7. Load test and compare p99 latency at high concurrency
Virtual threads represent a paradigm shift for Java. For the first time, high-concurrency IO-bound applications can be written with simple, sequential, debuggable code — and still achieve throughput that previously required reactive frameworks. The migration cost is near-zero for Spring Boot applications. The remaining challenge is library adoption, and that is improving rapidly with each JDK release.