Every team containerizing their workloads on AWS faces the same choice: ECS or EKS? ECS is simpler and tightly integrated with AWS. EKS is Kubernetes — portable, powerful, and complex. Getting this choice wrong means years of operational overhead or a major migration. This article gives you the framework to choose right.
What You're Actually Choosing
Before comparing deployment manifests or cost models, it is worth being precise about what you are choosing at a conceptual level. ECS and EKS are not just different config formats — they represent fundamentally different operational philosophies and lock-in tradeoffs.
ECS (Elastic Container Service):
AWS-proprietary container orchestrator
Tight AWS integration (IAM, VPC, CloudWatch native)
Simpler mental model
No Kubernetes knowledge required
Less ecosystem (Helm charts, operators don't apply)
EKS (Elastic Kubernetes Service):
Managed Kubernetes control plane
Industry-standard (runs anywhere: AWS, GCP, Azure, on-prem)
Steeper learning curve
Rich ecosystem (Helm, Prometheus, Keda, Argo, etc.)
More operational responsibility
The key question is not "which is better" — it is "which is right for your team's current size, Kubernetes expertise, and growth trajectory." A decision that optimizes for shipping speed today may cost you in portability two years from now, and vice versa.
Architecture Deep Dive
Understanding how each platform's components map to each other will help you reason about operational decisions like scaling, deployment, and IAM. The two architectures solve the same problem differently.
ECS Architecture
In ECS, the central abstraction is the Task Definition — it defines what containers to run, how much CPU and memory they get, and what secrets and environment variables they receive. A Service keeps a desired number of Tasks running and integrates with your load balancer. The diagram below shows how these concepts nest, and the CloudFormation snippet translates that structure directly into infrastructure-as-code.
ECS Cluster
│
├── ECS Service: order-service
│ ├── Task Definition (like a Dockerfile for the cluster)
│ │ - Container image: order-service:1.2.3
│ │ - CPU: 0.5 vCPU, Memory: 1GB
│ │ - Port mappings: 8080
│ │ - Environment variables
│ │ - Secrets from SSM/Secrets Manager
│ │
│ ├── Desired count: 3 tasks
│ ├── Load balancer: ALB (auto-registered)
│ ├── Auto Scaling: scale on CPU > 70%
│ └── Service Discovery: order-service.local
│
└── Capacity: Fargate (serverless) or EC2
Task Definition (AWS CloudFormation):
Type: AWS::ECS::TaskDefinition
Properties:
Family: order-service
Cpu: 512
Memory: 1024
NetworkMode: awsvpc
RequiresCompatibilities: [FARGATE]
ExecutionRoleArn: !GetAtt ECSExecutionRole.Arn
TaskRoleArn: !GetAtt OrderServiceRole.Arn
ContainerDefinitions:
- Name: order-service
Image: 123456.dkr.ecr.us-east-1.amazonaws.com/order-service:latest
PortMappings:
- ContainerPort: 8080
Environment:
- Name: SPRING_PROFILES_ACTIVE
Value: production
Secrets:
- Name: DATABASE_URL
ValueFrom: arn:aws:ssm:us-east-1:123:parameter/order-service/db-url
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: /ecs/order-service
awslogs-region: us-east-1
EKS Architecture
EKS adds a visible and billable control plane that manages the Kubernetes API server, etcd, and scheduler. Your application workloads run on node groups (EC2 instances) or Fargate profiles. The critical difference from ECS is that Kubernetes uses a rich object model — Deployments, Services, Ingresses, HPAs — each of which is a separate resource that you compose together.
EKS Cluster
│
├── Control Plane (AWS managed, ~$0.10/hour)
│ - API Server
│ - etcd
│ - Controller Manager
│ - Scheduler
│
├── Node Groups (your EC2 instances)
│ - Managed Node Group: 3× m6a.xlarge
│ - OR Fargate Profile (serverless)
│
└── Kubernetes Resources:
Deployment: order-service (3 replicas)
Service: ClusterIP (internal)
Ingress: ALB Ingress Controller → ALB
HPA: scale on CPU > 70%
ConfigMap/Secret: configuration
The Kubernetes Deployment manifest below is more verbose than its ECS equivalent, but that verbosity buys you precision. The resources.requests and resources.limits fields are not optional in production — without them, Kubernetes cannot make good scheduling decisions and your pods may be evicted during node pressure. The separate readinessProbe and livenessProbe are also distinct from the single ECS health check: readiness controls load balancer traffic, liveness triggers restarts.
# Kubernetes Deployment (EKS)
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
serviceAccountName: order-service-sa # For IRSA (IAM Roles for Service Accounts)
containers:
- name: order-service
image: 123456.dkr.ecr.us-east-1.amazonaws.com/order-service:1.2.3
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
env:
- name: SPRING_PROFILES_ACTIVE
value: production
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: order-service-secrets
key: database-url
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: External
external:
metric:
name: sqs_queue_depth
selector:
matchLabels:
queue-name: order-processing
target:
type: AverageValue
averageValue: "100"
Notice the second HPA metric — scaling on SQS queue depth. This is one of Kubernetes' most powerful features for event-driven architectures: scaling your consumers based on the actual backlog rather than CPU, which typically lags behind queue growth. ECS requires third-party tooling to achieve the same result.
Cost Comparison
Cost comparisons are often oversimplified. The numbers below use the same workload (5 microservices, production) across all three configurations so the comparison is apples-to-apples. The EKS control plane cost is fixed regardless of workload size — that is why the crossover point matters.
Scenario: 5 microservices, production workload
ECS on Fargate:
5 services × 3 tasks × 0.5 vCPU × $0.04048/vCPU-hour × 730h/month = $221
5 services × 3 tasks × 1GB × $0.004445/GB-hour × 730h/month = $49
Total compute: ~$270/month
ECS control plane: FREE
Total: ~$270/month
EKS on Fargate:
Same compute costs: ~$270/month
EKS control plane: $0.10/hour × 730h = $73/month
Total: ~$343/month (+$73 for K8s control plane)
EKS on EC2 (Managed Node Group):
3× m6a.xlarge Reserved (1yr): ~$250/month
EKS control plane: $73/month
Total: ~$323/month
BUT: fits many more pods per node (bin-packing advantage)
→ Better for 20+ microservices
Break-even: ECS is cheaper for small workloads (<10 services)
EKS/EC2 becomes cheaper at scale (>15 services) due to bin-packing
Hidden EKS costs:
- NAT Gateway for private nodes: $0.045/hour = $33/month minimum
- Load balancers per service: $16/month each
- EBS volumes for persistent storage
- Engineer time: Kubernetes expertise adds 20-40% DevOps overhead
The "hidden EKS costs" row is the most important one. Teams frequently compare raw infrastructure costs and undercount the engineering time required to operate Kubernetes correctly — managing node group upgrades, troubleshooting pod evictions, configuring network policies, and maintaining Helm chart versions is a continuous investment.
When to Choose ECS
The decision framework below is calibrated for the most common team profiles. If your situation falls clearly into these criteria, ECS will let you ship faster, spend less on operations, and stay focused on product work rather than infrastructure management.
✓ Choose ECS when:
- Small to medium team (< 20 engineers)
- 1-15 microservices
- Team has strong AWS knowledge but not Kubernetes
- Need to ship fast — ECS has 70% less configuration to learn
- All workloads stay on AWS (no multi-cloud requirements)
- Simple scaling requirements (CPU/memory based)
Example teams: Early-stage startups, AWS-native teams, teams migrating from Lambda
ECS strengths:
- 5-minute first deployment vs 1-2 hours for EKS
- Native CloudWatch integration (no Prometheus setup)
- IAM task roles are simpler than IRSA
- Service Connect replaces service mesh for most cases
- No control plane to manage or pay $73/month for small workloads
When to Choose EKS
EKS pays dividends when your operational requirements grow past what ECS can handle cleanly. The scenarios below are not hypothetical — each represents a real limitation of ECS that teams regularly hit at scale.
✓ Choose EKS when:
- Large team (20+ engineers) with Kubernetes experience
- 15+ microservices with complex inter-service dependencies
- Multi-cloud or hybrid cloud strategy
- Need advanced scheduling (GPU, spot, custom taints/tolerations)
- Rich ecosystem required (Argo Workflows, KEDA, Istio, Tekton)
- Compliance: need pod security policies, network policies
- Existing Kubernetes expertise in the team
EKS strengths:
- kubectl + Helm: de facto industry standard
- KEDA: event-driven autoscaling (scale on SQS depth, Kafka lag, custom metrics)
- Argo CD: GitOps deployment
- Network policies: fine-grained pod-to-pod traffic control
- Service mesh (Istio/Linkerd): mTLS, traffic management, circuit breaking
- Persistent workloads: StatefulSets, PersistentVolumes
Fargate vs EC2 Node Groups
Regardless of ECS or EKS, you choose how to run containers. This is a separate decision from the orchestrator choice and it has significant cost and operational implications of its own.
Fargate (serverless):
+ No EC2 management (patching, scaling, rightsizing)
+ Pay per task (second-level billing)
+ Perfect isolation (each task gets dedicated VM)
- 30% more expensive than EC2 Reserved
- Cold start: 30-90 seconds
- No GPU support
- Max 16 vCPU / 120GB RAM per task
EC2 Node Groups:
+ 40-60% cheaper with Reserved Instances
+ Full control over instance type, AMI
+ Better for predictable, stable workloads
+ Supports GPU, large memory instances
- You manage node patching, scaling
- Potential for noisy neighbors (bin-packing)
Hybrid approach (common):
- Fargate for variable/unpredictable workloads
- EC2 Reserved for stable base load
- EC2 Spot for batch/worker jobs (70% discount)
The hybrid approach is the practical sweet spot for most production environments. Your web API may run on Fargate (variable traffic, no server management), your background workers on EC2 Reserved (predictable load, high density), and your report generation jobs on EC2 Spot (fault-tolerant, 70% cheaper). Each workload type gets the pricing model that fits it.
Migration Path: ECS → EKS
If you start with ECS and need to migrate, the approach below minimizes risk by keeping both systems live during the transition. The key is incremental traffic shifting — you never flip a switch for 100% of traffic without a validation step at each increment.
1. Containerize first (same either way)
2. Use Copilot (ECS) or CDK/Terraform abstractions
→ Easier to swap underlying orchestrator
3. Migration strategy: parallel deployment
- Run ECS and EKS clusters simultaneously
- Migrate one service at a time
- Use Route 53 weighted routing to shift traffic gradually:
ECS: 90% → 50% → 10% → 0%
EKS: 10% → 50% → 90% → 100%
4. Timeline: 1-2 months per service for careful migration
The practical answer for most teams: start with ECS. It's simpler, cheaper to operate, and solves 90% of container orchestration problems. Migrate to EKS when you hit the ceiling — when you need KEDA for event-driven scaling, when you need Argo for GitOps, when your multi-cloud strategy requires portability. Don't let the Kubernetes ecosystem be the reason you choose EKS — let your specific operational requirements make the decision.