Capstone: Zero-Downtime Blue-Green Infrastructure
Congratulations on reaching the final capstone project of the Terraform & AWS DevOps Mastery track!
Throughout this course, you have designed networks, secured credentials, provisioned clusters, and automated delivery pipelines. In this capstone, we pull all these skills together to solve one of the most difficult challenges in systems engineering: executing zero-downtime infrastructure upgrades.
Specifically, we will design and build a Blue-Green Deployment Infrastructure that provisions a new application environment (Green) alongside our running environment (Blue) and swaps traffic seamlessly at the load-balancer layer.
The Blue-Green Architecture Layout
- Blue Environment: The current active production environment running version
1.0.0of our application. - Green Environment: The new environment running version
2.0.0of our application. - Application Load Balancer (ALB): The traffic router. It routes public traffic strictly to the Blue environment until the Green environment passes all health checks, at which point it shifts traffic gracefully.
[ Public Internet traffic ]
│
▼
[ Application Load Balancer ]
/ \
(Weight: 0%) / \ (Weight: 100%)
▼ ▼
[ Target Group Blue ] [ Target Group Green ]
(App Version: 1.0.0) (App Version: 2.0.0)
Step 1: Enforcing Order of Operations with Lifecycle Rules
By default, when you modify a resource that requires replacement (like altering an ECS Task Definition port or launching a new SQS queue configuration), Terraform destroys the existing resource first and then provisions the new one. This results in service downtime.
To change this, we must enforce a strict create_before_destroy lifecycle policy:
# modules/app/main.tf
# Define stateless container task
resource "aws_ecs_task_definition" "app" {
family = "${var.environment}-app"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "256"
memory = "512"
# ...
lifecycle {
# Force Terraform to spin up the new task version BEFORE terminating the old task version
create_before_destroy = true
}
}
Step 2: Provisioning Twin Target Groups
We provision two separate, identical target groups for our Load Balancer to hook up our Blue and Green tasks:
# modules/alb/blue-green.tf
# 1. Blue Target Group
resource "aws_lb_target_group" "blue" {
name = "${var.environment}-tg-blue"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip"
health_check {
path = "/health"
}
}
# 2. Green Target Group
resource "aws_lb_target_group" "green" {
name = "${var.environment}-tg-green"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip"
health_check {
path = "/health"
}
}
Step 3: Implementing Weighted Traffic Splitting
Now, we configure our Application Load Balancer listener. Instead of routing traffic to a single target group, we use a forward action with target_group weights.
By parameterizing these weights, our GitOps pipeline can shift traffic progressively (e.g. 10% canary traffic to Green -> 50% -> 100%):
# modules/alb/variables.tf
variable "blue_weight" {
description = "Percentage of traffic to route to the Blue target group (0-100)"
type = number
default = 100
}
variable "green_weight" {
description = "Percentage of traffic to route to the Green target group (0-100)"
type = number
default = 0
}
# modules/alb/blue-green.tf (continued)
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.this.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
forward {
target_group {
arn = aws_lb_target_group.blue.arn
weight = var.blue_weight # e.g. 100 during normal ops
}
target_group {
arn = aws_lb_target_group.green.arn
weight = var.green_weight # e.g. 0 during normal ops
}
}
}
}
Step 4: The Zero-Downtime Rollout Execution
When a new code release is merged:
- The CI pipeline triggers a Terraform run, creating the Green container tasks.
- The Green tasks register with
aws_lb_target_group.greenand begin executing health checks. - Once Green is fully healthy, the pipeline runs a surgical targeted apply to update the ALB weights:
terraform apply -var="blue_weight=0" -var="green_weight=100" -auto-approve - The Load Balancer gracefully shifts active connections to the Green tasks. Old connections on Blue are drained over 30 seconds (
deregistration_delay), resulting in absolute zero downtime. - Once traffic has migrated fully, the old Blue containers are cleanly decommissioned.
Congratulations!
You have designed, implemented, and executed a fully automated, production-grade Blue-Green deployment infrastructure. You are now equipped with the advanced technical systems knowledge needed to provision and manage modern, secure, and resilient cloud architectures at scale.