Lesson 15 of 15 4 minAdvanced Track

Capstone: Zero-Downtime Blue-Green Infrastructure

Build a fully automated blue-green infrastructure deployment using Terraform, Application Load Balancers, and target group swapping.

Reading Mode

Hide the curriculum rail and keep the lesson centered for focused reading.

Key Takeaways

  • Blue-Green deployments eliminate downtime by running two identical environments side-by-side.
  • Use create_before_destroy lifecycle rules in Terraform to prevent resource destruction before new assets are healthy.
  • Shift active traffic seamlessly at the Application Load Balancer layer using Target Group weight adjustments.
Recommended Prerequisites
terraform-aws-14-drift-detection-state-surgery

Premium outcome

Provision, secure, and automate production-grade cloud infrastructure at scale.

Backend and platform engineers who want to design, deploy, and automate robust production environments on AWS.

You leave with

  • A secure, modular, multi-environment AWS landing zone designed from scratch
  • A fully integrated GitOps deployment pipeline using GitHub Actions and Terraform S3 Backend
  • Hands-on expertise deploying containerized microservices (ECS Fargate + RDS) with secure IAM gating

Capstone: Zero-Downtime Blue-Green Infrastructure

Congratulations on reaching the final capstone project of the Terraform & AWS DevOps Mastery track!

Throughout this course, you have designed networks, secured credentials, provisioned clusters, and automated delivery pipelines. In this capstone, we pull all these skills together to solve one of the most difficult challenges in systems engineering: executing zero-downtime infrastructure upgrades.

Specifically, we will design and build a Blue-Green Deployment Infrastructure that provisions a new application environment (Green) alongside our running environment (Blue) and swaps traffic seamlessly at the load-balancer layer.


The Blue-Green Architecture Layout

  • Blue Environment: The current active production environment running version 1.0.0 of our application.
  • Green Environment: The new environment running version 2.0.0 of our application.
  • Application Load Balancer (ALB): The traffic router. It routes public traffic strictly to the Blue environment until the Green environment passes all health checks, at which point it shifts traffic gracefully.
                  [ Public Internet traffic ]
                              │
                              ▼
                [ Application Load Balancer ]
                 /                         \
    (Weight: 0%) /                           \ (Weight: 100%)
                ▼                             ▼
       [ Target Group Blue ]         [ Target Group Green ]
       (App Version: 1.0.0)          (App Version: 2.0.0)

Step 1: Enforcing Order of Operations with Lifecycle Rules

By default, when you modify a resource that requires replacement (like altering an ECS Task Definition port or launching a new SQS queue configuration), Terraform destroys the existing resource first and then provisions the new one. This results in service downtime.

To change this, we must enforce a strict create_before_destroy lifecycle policy:

# modules/app/main.tf

# Define stateless container task
resource "aws_ecs_task_definition" "app" {
  family                   = "${var.environment}-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  # ...

  lifecycle {
    # Force Terraform to spin up the new task version BEFORE terminating the old task version
    create_before_destroy = true
  }
}

Step 2: Provisioning Twin Target Groups

We provision two separate, identical target groups for our Load Balancer to hook up our Blue and Green tasks:

# modules/alb/blue-green.tf

# 1. Blue Target Group
resource "aws_lb_target_group" "blue" {
  name        = "${var.environment}-tg-blue"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"

  health_check {
    path = "/health"
  }
}

# 2. Green Target Group
resource "aws_lb_target_group" "green" {
  name        = "${var.environment}-tg-green"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"

  health_check {
    path = "/health"
  }
}

Step 3: Implementing Weighted Traffic Splitting

Now, we configure our Application Load Balancer listener. Instead of routing traffic to a single target group, we use a forward action with target_group weights.

By parameterizing these weights, our GitOps pipeline can shift traffic progressively (e.g. 10% canary traffic to Green -> 50% -> 100%):

# modules/alb/variables.tf

variable "blue_weight" {
  description = "Percentage of traffic to route to the Blue target group (0-100)"
  type        = number
  default     = 100
}

variable "green_weight" {
  description = "Percentage of traffic to route to the Green target group (0-100)"
  type        = number
  default     = 0
}
# modules/alb/blue-green.tf (continued)

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.this.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type = "forward"
    
    forward {
      target_group {
        arn    = aws_lb_target_group.blue.arn
        weight = var.blue_weight # e.g. 100 during normal ops
      }

      target_group {
        arn    = aws_lb_target_group.green.arn
        weight = var.green_weight # e.g. 0 during normal ops
      }
    }
  }
}

Step 4: The Zero-Downtime Rollout Execution

When a new code release is merged:

  1. The CI pipeline triggers a Terraform run, creating the Green container tasks.
  2. The Green tasks register with aws_lb_target_group.green and begin executing health checks.
  3. Once Green is fully healthy, the pipeline runs a surgical targeted apply to update the ALB weights:
    terraform apply -var="blue_weight=0" -var="green_weight=100" -auto-approve
    
  4. The Load Balancer gracefully shifts active connections to the Green tasks. Old connections on Blue are drained over 30 seconds (deregistration_delay), resulting in absolute zero downtime.
  5. Once traffic has migrated fully, the old Blue containers are cleanly decommissioned.

Congratulations!

You have designed, implemented, and executed a fully automated, production-grade Blue-Green deployment infrastructure. You are now equipped with the advanced technical systems knowledge needed to provision and manage modern, secure, and resilient cloud architectures at scale.

Want to track your progress?

Sign in to save your progress, track completed lessons, and pick up where you left off.