High-Availability Load Balancing & Auto-Scaling
In our previous lesson, we successfully provisioned stateless AWS ECS Fargate containers running our backend application inside private subnets. However, because these containers lack public IP addresses and sit behind isolated networks, clients have no way of accessing them.
To route traffic to our application safely and handle traffic spikes dynamically, we must:
- Deploy an Application Load Balancer (ALB) in our public subnets.
- Link our ECS containers to an ALB Target Group with automated HTTP health checks.
- Configure dynamic Target Tracking Auto-Scaling Policies to adjust container counts on-demand.
Step 1: Provisioning the Application Load Balancer (ALB)
An ALB operates at Layer 7 of the OSI model, making intelligent routing decisions based on HTTP headers, cookies, and URL paths. We place it in our public subnets:
# modules/alb/main.tf
# 1. The Application Load Balancer
resource "aws_lb" "this" {
name = "${var.environment}-app-alb"
internal = false # Internet-facing
load_balancer_type = "application"
security_groups = [var.alb_security_group_id]
subnets = var.public_subnet_ids # Public subnets across multiple AZs
enable_deletion_protection = var.environment == "prod"
tags = {
Environment = var.environment
}
}
# 2. ALB Target Group representing container destinations
resource "aws_lb_target_group" "app" {
name = "${var.environment}-app-tg"
port = 8080
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip" # Required for ECS Fargate
# Health Check configuration
health_check {
enabled = true
path = "/health" # Endpoint to hit on containers
protocol = "HTTP"
port = "traffic-port"
interval = 30 # Check every 30s
timeout = 5 # Give container 5s to respond
healthy_threshold = 3 # Mark healthy after 3 success checks
unhealthy_threshold = 3 # Mark dead after 3 failures
matcher = "200" # Expect HTTP 200
}
tags = {
Environment = var.environment
}
}
# 3. HTTP Listener routing traffic to target group
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.this.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.app.arn
}
}
Step 2: Associating ALB with ECS Service
Now, we update our ECS Service configuration in modules/ecs/main.tf to register containers with the ALB Target Group upon launch:
# modules/ecs/main.tf (updated block)
resource "aws_ecs_service" "app" {
# ... (previous config)
load_balancer {
target_group_arn = var.target_group_arn
container_name = "app"
container_port = 8080
}
}
When a task starts up, ECS dynamically registers the container's private IP address and port with the target group. Traffic will only be routed to the container once it successfully passes 3 sequential /health checks.
Step 3: Dynamic Auto-Scaling (Target Tracking)
Instead of manually scaling container counts or using complex, rigid step-scaling policies, we configure Target Tracking Auto-Scaling.
This functions like a home thermostat: you specify a target metric (e.g. keep overall CPU utilization at 70%), and AWS automatically provisions or terminates containers to maintain that state.
# modules/ecs/autoscaling.tf
# 1. Define SQS/ECS Scaling Target boundary
resource "aws_appautoscaling_target" "ecs" {
max_capacity = 10 # Scale up to 10 instances
min_capacity = 2 # Never fall below 2 instances
resource_id = "service/${var.ecs_cluster_name}/${aws_ecs_service.app.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
# 2. CPU Target Tracking Policy
resource "aws_appautoscaling_policy" "cpu" {
name = "ecs-cpu-scaling-policy"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.ecs.resource_id
scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
service_namespace = aws_appautoscaling_target.ecs.service_namespace
target_tracking_scaling_policy_configuration {
target_value = 70.0 # Keep CPU utilization at 70%
disable_scale_in = false
scale_in_cooldown = 300 # Wait 5 mins before scaling in
scale_out_cooldown = 60 # Scale out rapidly (1 min)
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
}
}
Through this setup, your application is now fully resilient. Traffic enters through a secure load-balancer layer, distributes across high-availability private containers, and automatically scales horizontally during sudden business demand surges with zero manual intervention.
Next Steps
We have completed the deployment of our core application and database stack. Now we transition to Module 5: Day-2 Ops & GitOps Automation. In the next lesson, we'll design a professional, automated CI/CD deployment pipeline using GitHub Actions, validating and applying our infrastructure code cleanly through Git.