AWS Lambda: Cold Starts, Memory Tuning, and Cost Optimization

Lambda functions are the easiest compute to get started with and the hardest to tune well. A 2-second cold start is a dealbreaker for a payment API. A 512MB function that runs in 1 second might be cheaper than a 128MB function that runs in 4 seconds. Understanding the internals turns Lambda from a frustrating black box into a predictable, cost-effective platform.

Cold Start Anatomy

To fix cold starts, you first need to understand what causes them. A cold start happens when Lambda needs to create a brand new execution environment for your function — this means provisioning infrastructure, downloading your code, and initializing your runtime before your handler ever runs. The breakdown below shows where the time actually goes, and which phases you can control.

Cold start sequence (each phase adds latency):

1. Find capacity (100-300ms)
   AWS provisions a new execution environment

2. Download deployment package (50-500ms)
   Scales with package size — 10MB vs 100MB matters

3. Initialize runtime (JVM: 400-2000ms, Node: 50-200ms, Python: 50-150ms)
   JVM cold starts are the worst — JVM initialization + class loading

4. Run INIT code (your code: 50-5000ms)
   Static initializers, Spring context startup, SDK client creation

5. Handle request (your function: varies)

Total cold start: 600ms (Node.js) to 10+ seconds (Spring Boot on JVM)

Warm invocations: Only step 5. Typically 1-50ms.

Cold start frequency:
  Low-traffic functions: most invocations are cold
  High-traffic functions: < 1% cold (execution environments reused)

The critical insight is that phases 1-4 are "cold start tax" and phase 5 is your actual work. Your optimization strategies target different phases: SnapStart eliminates phase 3, GraalVM eliminates phases 2 and 3, and provisioned concurrency eliminates phases 1-3 entirely by pre-running them. Which strategy you choose depends on your cold start budget and cost sensitivity.

Strategy 1: Java SnapStart (Lambda + Firecracker)

AWS Lambda SnapStart (2022) is the biggest improvement for Java cold starts. It takes a snapshot of the initialized JVM after INIT and restores it for cold starts — bypassing JVM initialization entirely.

SnapStart works by running your INIT phase once at deployment time and taking a memory snapshot of the fully initialized JVM. When a cold start happens, Lambda restores from that snapshot instead of initializing from scratch. The catch is that stateful connections — like database connection pools — survive in the snapshot but point to stale network connections after restore. The CRaC interface below is how you tell Lambda what to close before the snapshot and what to re-initialize after restore.

// build.gradle
plugins {
    id 'com.github.johnrengelman.shadow' version '8.1.1'
}

// Required: implement CRaC's Resource interface for SnapStart lifecycle hooks
import org.crac.*;

@Component
public class DatabaseConnectionPool implements Resource {

    private HikariDataSource dataSource;

    @PostConstruct
    public void init() {
        Core.getGlobalContext().register(this);  // Register for SnapStart hooks
        dataSource = createDataSource();
    }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
        // Called before snapshot — close connections (they won't survive restore)
        dataSource.close();
    }

    @Override
    public void afterRestore(Context<? extends Resource> context) throws Exception {
        // Called after restore from snapshot — re-initialize
        dataSource = createDataSource();
    }
}

Enabling SnapStart in your SAM template requires two things: the SnapStart property and an AutoPublishAlias. SnapStart only works on published Lambda versions — it cannot operate on the $LATEST unpublished version because snapshots are immutable and tied to specific code.

# SAM template
Resources:
  OrderFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: java21
      SnapStart:
        ApplyOn: PublishedVersions   # Enable SnapStart
      AutoPublishAlias: live          # Required for SnapStart

The performance difference is dramatic enough that SnapStart should be your first move for any Java Lambda with cold start concerns — it is free and requires only a few lines of configuration change.

Cold start comparison (Spring Boot Lambda, typical):

Without SnapStart:  8-12 seconds
With SnapStart:     200-600ms
GraalVM native:     80-200ms
Node.js:            200-500ms

Strategy 2: GraalVM Native Image

Compile your Java app to native binary — no JVM startup, minimal memory, sub-200ms cold starts.

GraalVM native image performs ahead-of-time (AOT) compilation, converting your entire Java application — including all the classes and libraries it uses — into a single native binary at build time. There is no JVM at runtime: the binary starts in milliseconds. The tradeoff is that Java's dynamic features (reflection, dynamic class loading) need to be declared explicitly at build time via hints.

# Install GraalVM
sdk install java 21.0.2-graal

# Build native image
./mvnw native:compile -Pnative

# The result: a single binary, no JVM needed
# Size: 40-80MB (vs 150MB JAR)
# Startup: 50ms (vs 8 seconds for Spring Boot)

Spring Boot 3 ships with built-in GraalVM support, but you need to help it discover classes that are accessed via reflection at runtime. The RuntimeHintsRegistrar below tells the GraalVM compiler to keep those classes accessible — without this, you will get ClassNotFoundException at runtime even though the class is present in your binary.

// Spring Boot 3 + Spring Native (GraalVM AOT compilation)
// Most Spring features work — @RestController, @Service, JPA, etc.
// Exception: heavy use of reflection needs hints

// Register reflection hints for classes that GraalVM can't discover automatically
@Configuration
@ImportRuntimeHints(OrderService.OrderHints.class)
public class OrderService {

    public static class OrderHints implements RuntimeHintsRegistrar {
        @Override
        public void registerHints(RuntimeHints hints, ClassLoader classLoader) {
            // Tell GraalVM to keep these classes accessible at runtime
            hints.reflection()
                .registerType(OrderEvent.class, MemberCategory.INVOKE_DECLARED_METHODS)
                .registerType(OrderCreatedEvent.class, MemberCategory.INVOKE_DECLARED_METHODS);

            // Keep resources in the native image
            hints.resources().registerPattern("db/migration/*.sql");
        }
    }
}

The Dockerfile below packages the native binary into a Lambda container image. Lambda's custom runtime interface requires the binary to be at /var/runtime/bootstrap — that is the entry point Lambda calls instead of a Java main method. The multi-stage build keeps the final image small by leaving the GraalVM build toolchain in the builder stage.

# Dockerfile for Lambda native image
FROM public.ecr.aws/amazonlinux/amazonlinux:2023 as builder
RUN yum install -y gcc zlib-devel

COPY target/native/order-function /function/bootstrap

FROM public.ecr.aws/amazonlinux/amazonlinux:2023
COPY --from=builder /function/bootstrap /var/runtime/bootstrap
RUN chmod +x /var/runtime/bootstrap

CMD ["/var/runtime/bootstrap"]

Strategy 3: Provisioned Concurrency

For latency-critical paths, keep Lambda warm by pre-initializing execution environments.

Provisioned Concurrency tells Lambda to keep a set number of execution environments permanently initialized and ready to handle requests — they will never cold start. You pay for this warmth even when no requests are coming in, so this is a deliberate cost-for-latency tradeoff. Use it for paths where cold start latency would breach your SLA.

# SAM template — provision 10 warm instances
Resources:
  PaymentFunction:
    Type: AWS::Serverless::Function
    Properties:
      AutoPublishAlias: live

  PaymentFunctionAlias:
    Type: AWS::Lambda::Alias
    Properties:
      FunctionName: !Ref PaymentFunction
      Name: live

  ProvisionedConcurrency:
    Type: AWS::Lambda::ProvisionedConcurrencyConfig
    Properties:
      FunctionName: !Ref PaymentFunction
      Qualifier: !GetAtt PaymentFunctionAlias.FunctionVersion
      ProvisionedConcurrentExecutions: 10

  # Auto-scale provisioned concurrency based on schedule
  ScalingTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    Properties:
      MaxCapacity: 100
      MinCapacity: 5
      ResourceId: !Sub function:${PaymentFunction}:live
      ServiceNamespace: lambda
      ScalableDimension: lambda:function:ProvisionedConcurrency

The cost calculation below is what should drive your decision. Provisioned Concurrency is not always expensive — for a payment API that has strict SLAs and moderate traffic, the few dollars per day is far cheaper than the engineering time spent debugging cold-start-related timeout errors in production.

Cost of provisioned concurrency:
  Standard Lambda: $0.00001667 per GB-second (only when running)
  Provisioned: $0.0000097 per GB-second (always, + $0.000004646 per request)

10 provisioned @ 512MB, 24 hours:
  = 10 × 0.5GB × 86400s × $0.0000097 = $4.18/day

Worth it for: Payment APIs, auth flows, anything user-facing with SLA < 100ms
Not worth it for: Batch jobs, event processors, internal background tasks

Memory Tuning: Lambda Power Tuning

Lambda pricing = duration × memory. More memory = faster execution (more CPU allocated proportionally) = possibly lower cost.

This is one of the least understood aspects of Lambda economics. AWS allocates CPU proportionally to memory — a 2048MB function gets roughly 4× the CPU of a 512MB function. For CPU-bound workloads like JSON serialization, database query processing, or JVM warmup, the extra CPU can cut execution time dramatically, and a shorter duration at higher memory often costs less than a longer duration at lower memory.

Counter-intuitive truth:
  512MB function taking 4 seconds: 4s × 0.5GB = 2 GB-seconds
  2048MB function taking 800ms:    0.8s × 2GB = 1.6 GB-seconds ← cheaper!

More memory = more CPU = faster = cheaper AND faster.
Sweet spot is not always the minimum memory.

Use AWS Lambda Power Tuning (open source tool):

Lambda Power Tuning is an AWS Step Functions state machine that invokes your function at multiple memory configurations and measures duration and cost at each setting. Rather than guessing, you run it once and get a data-driven recommendation. The strategy: "cost" parameter optimizes purely for cost — you can also use "balanced" to optimize for both cost and speed.

# Deploy and run Lambda Power Tuning
aws cloudformation deploy \
  --template-file template.yml \
  --stack-name lambda-power-tuning

# Run against your function
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:123456789:stateMachine:powerTuningStateMachine \
  --input '{
    "lambdaARN": "arn:aws:lambda:us-east-1:123:function:order-service",
    "powerValues": [128, 256, 512, 1024, 2048, 3008],
    "num": 50,
    "payload": {"orderId": "test-123"},
    "parallelInvocation": true,
    "strategy": "cost"
  }'

The results below show a typical Java function — 1024MB is both the cheapest AND 10× faster than 128MB. Without running this tool, most teams would default to 512MB or 128MB and pay more for worse performance.

Power Tuning results (typical Java function):

Memory  │ Duration │ Cost/req   │ Relative cost
128MB   │ 8,200ms  │ $0.0000137 │ 100%
256MB   │ 4,200ms  │ $0.0000140 │ 102%
512MB   │ 2,100ms  │ $0.0000140 │ 102%
1024MB  │ 800ms    │ $0.0000107 │ 78%   ← 22% cheaper AND 10x faster
2048MB  │ 420ms    │ $0.0000112 │ 82%
3008MB  │ 310ms    │ $0.0000121 │ 88%

Winner: 1024MB — best cost AND acceptable latency

Packaging Optimization

Lambda download time scales with package size. Smaller = faster cold starts.

Package size affects the second phase of the cold start sequence — every megabyte you remove from your deployment artifact directly reduces cold start time for new execution environments. Lambda layers are the key tool here: dependencies that rarely change can be packaged into a shared layer that Lambda caches at the infrastructure level, separate from your frequently-updated application code.

# Java: Use Lambda layers for dependencies (cache between deployments)
# Layer 1: AWS SDK + Spring Boot (rarely changes)
# Layer 2: Your dependencies (changes occasionally)
# Deployment zip: Just your code (changes every deploy)

# Measure: what's taking space?
unzip -l target/function.zip | sort -k1 -rn | head -20

# Common culprits:
# - AWS SDK v1 (huge) → switch to SDK v2
# - Duplicate transitive dependencies
# - Test libraries included in runtime

# Exclude test deps from final jar
configurations {
    runtimeClasspath {
        exclude group: 'junit'
        exclude group: 'mockito'
    }
}

Initialization Code Best Practices

Where you put initialization code is one of the highest-leverage changes you can make to Lambda performance. Code in your handler runs on every invocation — even on warm instances. Code in static blocks or constructors runs exactly once during the INIT phase and is then reused across all warm invocations. The contrast below is stark, and this mistake is common in early Lambda implementations.

// BAD: Initialize SDK clients inside handler (runs every invocation)
public class BadHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {

    @Override
    public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent event, Context context) {
        // WRONG: This creates a new client every invocation = 200ms overhead each time
        DynamoDbClient dynamoDb = DynamoDbClient.create();
        S3Client s3 = S3Client.create();
        // ...
    }
}

// GOOD: Initialize once in static block or constructor (runs once in INIT phase)
public class GoodHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {

    // These are initialized once and reused across warm invocations
    private static final DynamoDbClient DYNAMO = DynamoDbClient.builder()
        .region(Region.US_EAST_1)
        .httpClient(UrlConnectionHttpClient.create())  // Lighter than Netty for Lambda
        .build();

    private static final S3Client S3 = S3Client.builder()
        .region(Region.US_EAST_1)
        .build();

    private static final ObjectMapper MAPPER = new ObjectMapper()
        .configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);

    @Override
    public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent event, Context context) {
        // Just use pre-initialized clients — fast
    }
}

The choice of UrlConnectionHttpClient over the default Netty HTTP client is also deliberate — Netty is asynchronous and better for high-throughput scenarios, but it carries significant initialization overhead. For Lambda, where each environment handles one request at a time, the synchronous UrlConnectionHttpClient initializes faster and is the better default.

Concurrency and Throttling

Understanding Lambda concurrency is critical for production reliability. Without reserved concurrency, a runaway batch function can consume your entire regional concurrency limit and starve your user-facing payment function — a silent failure that looks like throttling with no obvious cause.

Lambda concurrency model:
  Concurrent executions = requests being processed simultaneously
  Default limit: 1000 per region (all functions combined)
  Reserve concurrency: guarantee a function gets capacity
  Throttle concurrency: prevent a function from using too much

# Reserve 200 concurrency for payment-critical path
aws lambda put-function-concurrency \
  --function-name payment-service \
  --reserved-concurrent-executions 200

# Throttle non-critical batch function to protect payment function
aws lambda put-function-concurrency \
  --function-name report-generator \
  --reserved-concurrent-executions 10

Think of reserved concurrency as both a floor and a ceiling. For your payment function, it is a guarantee that 200 execution environments are always available. For your report generator, it is a cap that prevents it from starving more important functions. Both are needed in a production system with multiple functions sharing a region.

Production Checklist

With the optimization strategies in place, the checklist below is a rapid-scan of the changes with the highest return on investment. Most teams can implement the top half of this list in a single sprint and see measurable improvements in both cold start time and monthly cost.

Cold start optimization:
  ✓ Java: Use SnapStart (free, 10x improvement)
  ✓ Java: Consider GraalVM native for sub-100ms target
  ✓ All runtimes: Move SDK initialization to INIT phase (static/constructor)
  ✓ Reduce package size: remove unused deps, use layers

Latency:
  ✓ Run Lambda Power Tuning to find optimal memory
  ✓ Enable provisioned concurrency for user-facing functions
  ✓ Use ARM64 (Graviton2) — same cost, ~20% better price-performance
  ✓ Place Lambda in same region as dependencies (RDS, DynamoDB)

Cost:
  ✓ Set function timeout correctly (don't set 15min for 5s functions)
  ✓ Use ARM64 (Graviton2) architecture — 20% cheaper per GB-second
  ✓ Use tiered pricing: functions over 6B GB-seconds/month get 20% discount
  ✓ SQS trigger: use batch size 10 (10 messages per invocation = 10x cheaper)

Reliability:
  ✓ Always set DLQ (Dead Letter Queue) for async invocations
  ✓ Set reserved concurrency to prevent throttle cascades
  ✓ Enable X-Ray tracing for production debugging (or OTel)
  ✓ Export Lambda metrics to CloudWatch: duration, errors, throttles, ConcurrentExecutions

The 80/20 of Lambda optimization: use SnapStart for Java (free, instant win), run Lambda Power Tuning once (10 minutes, find optimal memory), and move all initialization out of the handler. These three changes alone cut cold start time by 80% and often reduce cost simultaneously.

AWS Lambda: Cold Starts, Memory Tuning, and Cost Optimization

Cold Start Anatomy

Strategy 1: Java SnapStart (Lambda + Firecracker)

Strategy 2: GraalVM Native Image

Strategy 3: Provisioned Concurrency

Memory Tuning: Lambda Power Tuning

Packaging Optimization

Initialization Code Best Practices

Concurrency and Throttling

Production Checklist

Recommended Resources

Sachin Sarawgi

Related Articles

AWS Lambda in Production: Cold Starts, Concurrency, and Cost Optimization

Kubernetes in Production: Patterns Every Backend Engineer Must Know

Terraform Infrastructure as Code: Production Patterns and Pitfalls

AWS Lambda: Cold Starts, Memory Tuning, and Cost Optimization

Cold Start Anatomy

Strategy 1: Java SnapStart (Lambda + Firecracker)

Strategy 2: GraalVM Native Image

Strategy 3: Provisioned Concurrency

Memory Tuning: Lambda Power Tuning

Packaging Optimization

Initialization Code Best Practices

Concurrency and Throttling

Production Checklist

Recommended Resources

Get the next backend guide in your inbox

Sachin Sarawgi

Related Articles

AWS Lambda in Production: Cold Starts, Concurrency, and Cost Optimization

Kubernetes in Production: Patterns Every Backend Engineer Must Know

Terraform Infrastructure as Code: Production Patterns and Pitfalls