Spring Boot Production Readiness Checklist: Timeouts, Pools, Health Checks, and Observability

Spring Boot makes it easy to start a service. Production makes it clear whether the service is actually ready.

A production-ready service is not just one that passes unit tests. It has bounded timeouts, sane thread pools, useful health checks, structured logs, metrics, graceful shutdown, safe configuration, and predictable behavior when dependencies fail.

This checklist focuses on the things that prevent real outages.

1. Set Timeouts Everywhere

The default timeout is often too high, missing, or hidden in a library. Every outbound call should have a connect timeout and a read/response timeout.

For WebClient:

HttpClient httpClient = HttpClient.create()
    .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 1000)
    .responseTimeout(Duration.ofSeconds(2));

WebClient client = WebClient.builder()
    .clientConnector(new ReactorClientHttpConnector(httpClient))
    .baseUrl("https://payment-service")
    .build();

For RestTemplate:

SimpleClientHttpRequestFactory factory = new SimpleClientHttpRequestFactory();
factory.setConnectTimeout(1000);
factory.setReadTimeout(2000);
RestTemplate restTemplate = new RestTemplate(factory);

Timeouts should be lower than the upstream caller's timeout. If your load balancer times out at 30 seconds, your service should fail dependency calls much earlier and return a controlled error.

2. Tune Database Pooling

HikariCP is fast, but it cannot guess your production topology. Set pool size based on database capacity and pod count:

spring:
  datasource:
    hikari:
      maximum-pool-size: 15
      minimum-idle: 5
      connection-timeout: 1000
      max-lifetime: 1800000
      leak-detection-threshold: 30000

Alert on:

hikaricp.connections.pending
hikaricp.connections.timeout
hikaricp.connections.acquire
hikaricp.connections.usage

If pending rises, do not blindly increase pool size. Check slow queries and transaction scope first.

3. Keep Transactions Short

Do not wrap HTTP calls inside database transactions:

@Transactional
public void badCheckout(Order order) {
    orderRepository.save(order);
    paymentClient.charge(order); // holds DB transaction while waiting
}

Prefer:

public void checkout(Order order) {
    PaymentResult payment = paymentClient.charge(order);
    persistOrder(order, payment);
}

@Transactional
public void persistOrder(Order order, PaymentResult payment) {
    orderRepository.save(order.withPayment(payment));
}

Transactions should protect data consistency, not the whole workflow.

4. Expose Useful Health Checks

Enable Actuator:

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  endpoint:
    health:
      probes:
        enabled: true

Use separate liveness and readiness probes:

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080

Liveness means "restart me if I am dead." Readiness means "do not send me traffic right now." Do not make liveness depend on the database, or a database outage can cause every pod to restart repeatedly.

5. Graceful Shutdown

When Kubernetes terminates a pod, the service needs time to stop accepting traffic and finish in-flight requests.

server:
  shutdown: graceful

spring:
  lifecycle:
    timeout-per-shutdown-phase: 30s

Kubernetes:

terminationGracePeriodSeconds: 45

This prevents connection resets during deployments and node drains.

6. Structured Logging

Logs should answer operational questions quickly. Include request ID, user/tenant where safe, route, status, duration, and error type.

{
  "event": "http_request",
  "trace_id": "abc123",
  "route": "/orders",
  "status": 201,
  "duration_ms": 84,
  "tenant_id": "t_42"
}

Never log secrets, tokens, full card numbers, or raw PII. Add masking at the logging boundary.

7. Metrics That Matter

Expose Prometheus metrics with Micrometer:

management:
  metrics:
    tags:
      application: checkout-api

Alert on symptoms:

request p95/p99 latency
error rate by route
dependency latency
Hikari pending connections
JVM GC pauses
executor queue size
Kafka consumer lag if applicable

Avoid alerting only on CPU. CPU can be high while the service is healthy, and low while every request is stuck waiting on a dependency.

8. Resilience Defaults

Use circuit breakers for slow dependencies:

resilience4j:
  circuitbreaker:
    instances:
      paymentService:
        slidingWindowSize: 50
        failureRateThreshold: 50
        slowCallDurationThreshold: 2s
        slowCallRateThreshold: 50

Retries should be limited and jittered:

resilience4j:
  retry:
    instances:
      paymentService:
        maxAttempts: 2
        waitDuration: 100ms

Do not retry non-idempotent operations unless the downstream API supports idempotency keys.

9. Deployment Safety

Production deployments should have:

readiness checks
rolling updates
rollback path
feature flags for risky behavior
database migrations compatible with old and new code
canary metrics for error rate and latency

For database changes, follow expand-contract:

Add nullable column
Deploy code that writes both old and new
Backfill
Deploy code that reads new
Remove old column later

Final Checklist

Timeouts on every outbound call
HikariCP sized by database capacity
Short transactions
Separate liveness and readiness probes
Graceful shutdown enabled
Structured logs with trace IDs
Prometheus metrics exposed
Circuit breakers and bounded retries
Safe deployment and rollback strategy
Alerts tied to user impact

Spring Boot gives you strong defaults for development. Production readiness comes from making every important failure mode explicit.

Spring Boot Production Readiness Checklist: Timeouts, Pools, Health Checks, and Observability

1. Set Timeouts Everywhere

2. Tune Database Pooling

3. Keep Transactions Short

4. Expose Useful Health Checks

5. Graceful Shutdown

6. Structured Logging

7. Metrics That Matter

8. Resilience Defaults

9. Deployment Safety

Final Checklist

Read Next

Recommended Resources

Sachin Sarawgi

Related Articles

Database Connection Pool Tuning: HikariCP, PostgreSQL, and Traffic Spikes

Java Concurrency Patterns: CompletableFuture, Structured Concurrency, and Thread-Safe Design

Java Memory Management Deep Dive: Heap, GC, and Production Tuning

Spring Boot Production Readiness Checklist: Timeouts, Pools, Health Checks, and Observability

1. Set Timeouts Everywhere

2. Tune Database Pooling

3. Keep Transactions Short

4. Expose Useful Health Checks

5. Graceful Shutdown

6. Structured Logging

7. Metrics That Matter

8. Resilience Defaults

9. Deployment Safety

Final Checklist

Read Next

Recommended Resources

Get the next backend guide in your inbox

Sachin Sarawgi

Related Articles

Database Connection Pool Tuning: HikariCP, PostgreSQL, and Traffic Spikes

Java Concurrency Patterns: CompletableFuture, Structured Concurrency, and Thread-Safe Design

Java Memory Management Deep Dive: Heap, GC, and Production Tuning