High-Availability Load Balancing & Auto-Scaling

In our previous lesson, we successfully provisioned stateless AWS ECS Fargate containers running our backend application inside private subnets. However, because these containers lack public IP addresses and sit behind isolated networks, clients have no way of accessing them.

To route traffic to our application safely and handle traffic spikes dynamically, we must:

Deploy an Application Load Balancer (ALB) in our public subnets.
Link our ECS containers to an ALB Target Group with automated HTTP health checks.
Configure dynamic Target Tracking Auto-Scaling Policies to adjust container counts on-demand.

Step 1: Provisioning the Application Load Balancer (ALB)

An ALB operates at Layer 7 of the OSI model, making intelligent routing decisions based on HTTP headers, cookies, and URL paths. We place it in our public subnets:

# modules/alb/main.tf

# 1. The Application Load Balancer
resource "aws_lb" "this" {
  name               = "${var.environment}-app-alb"
  internal           = false # Internet-facing
  load_balancer_type = "application"
  security_groups    = [var.alb_security_group_id]
  subnets            = var.public_subnet_ids # Public subnets across multiple AZs

  enable_deletion_protection = var.environment == "prod"

  tags = {
    Environment = var.environment
  }
}

# 2. ALB Target Group representing container destinations
resource "aws_lb_target_group" "app" {
  name        = "${var.environment}-app-tg"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = var.vpc_id
  target_type = "ip" # Required for ECS Fargate

  # Health Check configuration
  health_check {
    enabled             = true
    path                = "/health" # Endpoint to hit on containers
    protocol            = "HTTP"
    port                = "traffic-port"
    interval            = 30 # Check every 30s
    timeout             = 5  # Give container 5s to respond
    healthy_threshold   = 3  # Mark healthy after 3 success checks
    unhealthy_threshold = 3  # Mark dead after 3 failures
    matcher             = "200" # Expect HTTP 200
  }

  tags = {
    Environment = var.environment
  }
}

# 3. HTTP Listener routing traffic to target group
resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.this.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

Step 2: Associating ALB with ECS Service

Now, we update our ECS Service configuration in modules/ecs/main.tf to register containers with the ALB Target Group upon launch:

# modules/ecs/main.tf (updated block)

resource "aws_ecs_service" "app" {
  # ... (previous config)

  load_balancer {
    target_group_arn = var.target_group_arn
    container_name   = "app"
    container_port   = 8080
  }
}

When a task starts up, ECS dynamically registers the container's private IP address and port with the target group. Traffic will only be routed to the container once it successfully passes 3 sequential /health checks.

Step 3: Dynamic Auto-Scaling (Target Tracking)

Instead of manually scaling container counts or using complex, rigid step-scaling policies, we configure Target Tracking Auto-Scaling.

This functions like a home thermostat: you specify a target metric (e.g. keep overall CPU utilization at 70%), and AWS automatically provisions or terminates containers to maintain that state.

# modules/ecs/autoscaling.tf

# 1. Define SQS/ECS Scaling Target boundary
resource "aws_appautoscaling_target" "ecs" {
  max_capacity       = 10 # Scale up to 10 instances
  min_capacity       = 2  # Never fall below 2 instances
  resource_id        = "service/${var.ecs_cluster_name}/${aws_ecs_service.app.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

# 2. CPU Target Tracking Policy
resource "aws_appautoscaling_policy" "cpu" {
  name               = "ecs-cpu-scaling-policy"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value       = 70.0 # Keep CPU utilization at 70%
    disable_scale_in   = false
    scale_in_cooldown  = 300 # Wait 5 mins before scaling in
    scale_out_cooldown = 60  # Scale out rapidly (1 min)

    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
  }
}

Through this setup, your application is now fully resilient. Traffic enters through a secure load-balancer layer, distributes across high-availability private containers, and automatically scales horizontally during sudden business demand surges with zero manual intervention.

Next Steps

We have completed the deployment of our core application and database stack. Now we transition to Module 5: Day-2 Ops & GitOps Automation. In the next lesson, we'll design a professional, automated CI/CD deployment pipeline using GitHub Actions, validating and applying our infrastructure code cleanly through Git.

High-Availability Load Balancing & Auto-Scaling

Provision, secure, and automate production-grade cloud infrastructure at scale.