Capstone: Zero-Downtime Blue-Green Infrastructure

Congratulations on reaching the final capstone project of the Terraform & AWS DevOps Mastery track!

Throughout this course, you have designed networks, secured credentials, provisioned clusters, and automated delivery pipelines. In this capstone, we pull all these skills together to solve one of the most difficult challenges in systems engineering: executing zero-downtime infrastructure upgrades.

Specifically, we will design and build a Blue-Green Deployment Infrastructure that provisions a new application environment (Green) alongside our running environment (Blue) and swaps traffic seamlessly at the load-balancer layer.

The Blue-Green Architecture Layout

Blue Environment: The current active production environment running version 1.0.0 of our application.
Green Environment: The new environment running version 2.0.0 of our application.
Application Load Balancer (ALB): The traffic router. It routes public traffic strictly to the Blue environment until the Green environment passes all health checks, at which point it shifts traffic gracefully.

                  [ Public Internet traffic ]
                              │
                              ▼
                [ Application Load Balancer ]
                 /                         \
    (Weight: 0%) /                           \ (Weight: 100%)
                ▼                             ▼
       [ Target Group Blue ]         [ Target Group Green ]
       (App Version: 1.0.0)          (App Version: 2.0.0)

Step 1: Enforcing Order of Operations with Lifecycle Rules

By default, when you modify a resource that requires replacement (like altering an ECS Task Definition port or launching a new SQS queue configuration), Terraform destroys the existing resource first and then provisions the new one. This results in service downtime.

To change this, we must enforce a strict create_before_destroy lifecycle policy:

# modules/app/main.tf

# Define stateless container task
resource "aws_ecs_task_definition" "app" {
  family                   = "${var.environment}-app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  # ...

  lifecycle {
    # Force Terraform to spin up the new task version BEFORE terminating the old task version
    create_before_destroy = true
  }
}

Step 2: Provisioning Twin Target Groups

We provision two separate, identical target groups for our Load Balancer to hook up our Blue and Green tasks:

# modules/alb/blue-green.tf

# 1. Blue Target Group
resource "aws_lb_target_group" "blue" {
  name        = "${var.environment}-tg-blue"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"

  health_check {
    path = "/health"
  }
}

# 2. Green Target Group
resource "aws_lb_target_group" "green" {
  name        = "${var.environment}-tg-green"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip"

  health_check {
    path = "/health"
  }
}

Step 3: Implementing Weighted Traffic Splitting

Now, we configure our Application Load Balancer listener. Instead of routing traffic to a single target group, we use a forward action with target_group weights.

By parameterizing these weights, our GitOps pipeline can shift traffic progressively (e.g. 10% canary traffic to Green -> 50% -> 100%):

# modules/alb/variables.tf

variable "blue_weight" {
  description = "Percentage of traffic to route to the Blue target group (0-100)"
  type        = number
  default     = 100
}

variable "green_weight" {
  description = "Percentage of traffic to route to the Green target group (0-100)"
  type        = number
  default     = 0
}

# modules/alb/blue-green.tf (continued)

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.this.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type = "forward"
    
    forward {
      target_group {
        arn    = aws_lb_target_group.blue.arn
        weight = var.blue_weight # e.g. 100 during normal ops
      }

      target_group {
        arn    = aws_lb_target_group.green.arn
        weight = var.green_weight # e.g. 0 during normal ops
      }
    }
  }
}

Step 4: The Zero-Downtime Rollout Execution

When a new code release is merged:

The CI pipeline triggers a Terraform run, creating the Green container tasks.
The Green tasks register with aws_lb_target_group.green and begin executing health checks.
Once Green is fully healthy, the pipeline runs a surgical targeted apply to update the ALB weights:
```
terraform apply -var="blue_weight=0" -var="green_weight=100" -auto-approve
```
The Load Balancer gracefully shifts active connections to the Green tasks. Old connections on Blue are drained over 30 seconds (deregistration_delay), resulting in absolute zero downtime.
Once traffic has migrated fully, the old Blue containers are cleanly decommissioned.

Congratulations!

You have designed, implemented, and executed a fully automated, production-grade Blue-Green deployment infrastructure. You are now equipped with the advanced technical systems knowledge needed to provision and manage modern, secure, and resilient cloud architectures at scale.

Capstone: Zero-Downtime Blue-Green Infrastructure

Provision, secure, and automate production-grade cloud infrastructure at scale.