High-Availability Relational Databases: RDS Multi-AZ Setup
Databases represent the state and source of truth of your entire application platform. If a web server goes down, you spin up a new container in seconds. If your primary database crashes and suffers data corruption, your entire platform halts, resulting in thousands of dollars in lost revenue and customer trust.
To secure our data layer, we must provision a highly resilient RDS PostgreSQL database utilizing Multi-AZ synchronous replication.
Multi-AZ vs. Read Replicas
Understanding the architectural difference is critical for both system design interviews and operational excellence:
| Metric | Multi-AZ (Active-Passive HA) | Read Replicas (Active-Active Scaling) |
|---|---|---|
| Primary Use | High Availability & Disaster Recovery | Read Throughput Scalability |
| Replication Type | Synchronous (Zero Data Loss) | Asynchronous (Eventual Consistency) |
| Failover | Automatic (DNS endpoint updates in ~60s) | Manual promotion is required |
| Deployment | Standby instance in a separate AZ (hidden) | Active read-only instances (exposed via endpoint) |
[ App container ]
│ (Writes to DNS endpoint: db.codesprintpro.com)
▼
[ Primary RDS (AZ A) ] ◄── Synchronous Replication ──► [ Standby RDS (AZ B) ]
(Active - read/write) (Passive hot-standby)
Step 1: Provisioning the Subnet Group & Parameter Group
Before launching our database, we must associate it with the isolated database subnet group we created inside our VPC module and configure database parameters safely:
# modules/rds/main.tf
# 1. Custom PostgreSQL Parameter Group
resource "aws_db_parameter_group" "postgres" {
name = "${var.environment}-postgres15-pg"
family = "postgres15"
# Force TLS connections for data-in-transit security
parameter {
name = "rds.force_ssl"
value = "1"
}
# Enable slow query logging
parameter {
name = "log_min_duration_statement"
value = "1000" # Log query if execution exceeds 1000ms
}
}
Step 2: Provisioning the Production-Grade RDS Postgres Instance
Now, we write the resource definition. We utilize conditionals (var.environment == "prod") to toggle expensive production-only safety features like Multi-AZ and deletion protection off in sandbox/dev environments to save costs.
# modules/rds/main.tf (continued)
resource "aws_db_instance" "this" {
identifier = "${var.environment}-app-db"
engine = "postgres"
engine_version = "15.4"
instance_class = var.db_instance_class # e.g. "db.r6g.large" for prod
allocated_storage = var.allocated_storage_gb
max_allocated_storage = var.max_allocated_storage_gb # Autoscales storage up to this limit
storage_type = "gp3"
storage_encrypted = true
db_name = "appdb"
username = "admin"
password = var.database_password # Decrypted dynamically from Secrets Manager
db_subnet_group_name = var.db_subnet_group_name
vpc_security_group_ids = [var.db_security_group_id]
parameter_group_name = aws_db_parameter_group.postgres.name
# High Availability Configuration
multi_az = var.environment == "prod"
# Backup Configuration
backup_retention_period = var.environment == "prod" ? 7 : 1 # Keep 7 days in prod
backup_window = "03:00-04:00" # Daily maintenance window
copy_tags_to_snapshot = true
skip_final_snapshot = var.environment != "prod"
final_snapshot_identifier = var.environment == "prod" ? "${var.environment}-app-db-final-snapshot" : null
# Deletion Protection: Block accidental CLI/Console deletion actions
deletion_protection = var.environment == "prod"
tags = {
Environment = var.environment
ManagedBy = "Terraform"
}
}
Step 3: Expose Connection Outputs
Our application containers require the connection endpoint and port to connect to the database cluster:
# modules/rds/outputs.tf
output "db_endpoint" {
value = aws_db_instance.this.endpoint
description = "The connection endpoint for the RDS instance"
}
output "db_port" {
value = aws_db_instance.this.port
description = "The database connection port"
}
By encapsulating database storage groups, synchronous replication, parameter security, and strict lifecycle controls, you guarantee that your data layers are durable, auditable, and resilient to physical availability zone outages.
Next Steps
Now that our database is securely deployed inside the isolated networking layer, we are ready to build our compute infrastructure. In the next lesson, we will provision AWS ECS Fargate clusters to run containerized backend microservices securely and scale them dynamically.