Designing a Production-Grade AWS VPC from Scratch
A Virtual Private Cloud (VPC) is the logical network boundary for all your cloud resources. By default, AWS provides a "Default VPC" in every account. However, default VPCs assign public IP addresses to all resources and route all subnets to the public internet. This is a severe security risk.
A production-grade architecture must implement Defense in Depth by segmenting the network into logical, isolated layers across multiple Availability Zones (AZs).
+-----------------------------------------------------------------------------------+
| AWS Cloud Region (us-east-1) |
| VPC: 10.0.0.0/16 |
| |
| +--------------------------+ +--------------------------+ |
| | Availability Zone A | | Availability Zone B | |
| | | | | |
| | Public Subnet (10.0.1) | | Public Subnet (10.0.2) | <- ALB |
| | | | | |
| | Private Subnet (10.0.11)| | Private Subnet (10.0.12)| <- App |
| | | | | |
| | DB Subnet (10.0.21) | | DB Subnet (10.0.22) | <- RDS |
| +--------------------------+ +--------------------------+ |
| |
+-----------------------------------------------------------------------------------+
Understanding Subnet Segregation
- Public Subnet Layer
- Hosts public-facing Application Load Balancers (ALBs) and NAT Gateways.
- Has a route to an Internet Gateway (IGW).
- Assigns public IP addresses on launch.
- Private Subnet Layer
- Hosts stateless application container workloads (ECS Fargate/EKS) and background workers.
- Routes outbound traffic to the public internet through a NAT Gateway in the public subnet.
- Never assigns public IPs.
- Database / Isolated Subnet Layer
- Hosts transactional databases (RDS) and caching clusters (ElastiCache).
- Has zero outbound or inbound internet access.
- Can only communicate with the private application layer.
Hands-on: Provisioning the VPC Topology
Let's write a reusable, custom Terraform module to provision this network. Create a folder modules/vpc/ and add the following files:
# modules/vpc/variables.tf
variable "vpc_cidr" {
description = "Base CIDR block for the VPC"
type = string
default = "10.0.0.0/16"
}
variable "environment" {
description = "Environment name (dev, staging, prod)"
type = string
}
variable "availability_zones" {
description = "List of Availability Zones to provision subnets in"
type = list(string)
default = ["us-east-1a", "us-east-1b"]
}
Now, implement the main networking resources:
# modules/vpc/main.tf
# 1. The Core VPC
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-vpc"
Environment = var.environment
}
}
# 2. Internet Gateway for Public Routing
resource "aws_internet_gateway" "igw" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.environment}-igw"
Environment = var.environment
}
}
# 3. Public Subnets
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true # Instances receive public IP automatically
tags = {
Name = "${var.environment}-public-subnet-${var.availability_zones[count.index]}"
Environment = var.environment
}
}
# 4. Private Subnets
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 11}.0/24"
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = false
tags = {
Name = "${var.environment}-private-subnet-${var.availability_zones[count.index]}"
Environment = var.environment
}
}
# 5. Database Subnets (Isolated)
resource "aws_subnet" "database" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = "10.0.${count.index + 21}.0/24"
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = false
tags = {
Name = "${var.environment}-db-subnet-${var.availability_zones[count.index]}"
Environment = var.environment
}
}
# RDS database subnet group
resource "aws_db_subnet_group" "rds" {
name = "${var.environment}-rds-subnet-group"
description = "Subnet group for RDS cluster"
subnet_ids = aws_subnet.database[*].id
tags = {
Name = "${var.environment}-rds-subnet-group"
Environment = var.environment
}
}
Now, expose the necessary identifiers using outputs:
# modules/vpc/outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
}
output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
}
output "db_subnet_ids" {
value = aws_subnet.database[*].id
}
output "db_subnet_group_name" {
value = aws_db_subnet_group.rds.name
}
Step 3: Configure Public Route Tables
For our public subnets to act as public subnets, they must route target 0.0.0.0/0 (internet) requests to the Internet Gateway:
# modules/vpc/main.tf (continued)
# Public Route Table
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.igw.id
}
tags = {
Name = "${var.environment}-public-rt"
Environment = var.environment
}
}
# Associate Public Route Table to Public Subnets
resource "aws_route_table_association" "public" {
count = length(var.availability_zones)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
Next Steps
Our VPC is now provisioned with high-availability public, private, and database boundaries across multiple availability zones. However, if we deploy resources inside the private subnet, they won't be able to fetch code packages, call third-party APIs, or download updates.
In the next lesson, we will provision NAT Gateways to enable secure egress routing for our private subnets and secure port boundaries using stateful Security Groups.