Amazon Auto Scaling: Architecting Elastic Infrastructure on AWS
Auto Scaling with Launch Templates, ASGs, Elastic Load Balancers, scaling policies, and best practices
Modern production workloads rarely sustain a uniform traffic profile. Demand fluctuates throughout the day, surges unpredictably during promotional events, and contracts to near-zero during off-peak hours. Amazon Auto Scaling addresses this fundamental operational challenge by dynamically adjusting compute capacity — the number of running EC2 instances — in direct response to real-time demand signals. Rather than provisioning for peak capacity and absorbing waste, or under-provisioning and risking degraded user experience, Auto Scaling maintains the precise level of infrastructure your application requires at any given moment.
When to deploy Auto Scaling: Applications exhibiting variable traffic patterns, systems requiring high availability guarantees, and architectures demanding automatic horizontal scaling all benefit from this service.
Alternatives worth evaluating: Manual vertical scaling via instance type changes, container-based auto-scaling through ECS or EKS, and serverless compute via AWS Lambda each address overlapping but distinct use cases.
Key Concepts
| Concept | Description |
|---|---|
| AMI | A complete snapshot of an EC2 instance — operating system, installed software, and application code — serving as an immutable template for launching identical instances |
| Launch Template | A reusable, versioned configuration that defines how to launch instances, including instance type, AMI, security groups, user data, and IAM profiles |
| Auto Scaling Group | A logical collection of EC2 instances managed as a single unit, automatically maintaining the desired instance count based on defined policies |
| Desired Capacity | The number of instances the ASG actively maintains at all times |
| Min/Max Size | The lower and upper bounds constraining the number of instances the ASG can provision |
| Scaling Policy | A rule defining when and by how much the ASG should scale — adding or removing instances |
| Target Tracking Policy | A policy that maintains a specific metric at a designated target value, scaling automatically as conditions shift |
| Step Scaling Policy | A policy with multiple thresholds that responds proportionally based on the severity of metric deviation |
| Health Check | A periodic verification determining whether an instance is functioning correctly |
| Grace Period | The interval the ASG waits before evaluating health checks on newly launched instances, allowing sufficient time for full initialization |
| Target Group | A logical collection of instances receiving traffic from a Load Balancer, with its own health check configuration |
| Elastic Load Balancer | A managed service distributing incoming traffic across multiple EC2 instances spanning different Availability Zones |
| Connection Draining | The interval during which a terminating instance stops receiving new requests while completing all in-flight connections |
Essential AWS CLI Commands
aws ec2 create-image \
--instance-id i-1234567890abcdef0 \
--name "myapp-v1.0.0" \
--description "Production app with dependencies" \
--no-rebootaws ec2 create-launch-template \
--launch-template-name myapp-template \
--version-description "Initial version" \
--launch-template-data '{
"ImageId": "ami-0abc123",
"InstanceType": "t3.micro",
"SecurityGroupIds": ["sg-12345"],
"UserData": "IyEvYmluL2Jhc2g...",
"IamInstanceProfile": {"Name": "ec2-role"},
"KeyName": "my-keypair"
}'aws ec2 create-launch-template-version \
--launch-template-id lt-0abc123 \
--version-description "Updated to t3.small" \
--launch-template-data '{"InstanceType": "t3.small"}'aws elbv2 create-target-group \
--name myapp-targets \
--protocol HTTP \
--port 80 \
--vpc-id vpc-12345 \
--health-check-path /health \
--health-check-interval-seconds 30 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3aws elbv2 create-load-balancer \
--name myapp-lb \
--subnets subnet-abc123 subnet-def456 \
--security-groups sg-12345 \
--scheme internet-facing \
--type applicationaws elbv2 create-listener \
--load-balancer-arn arn:aws:elasticloadbalancing:... \
--protocol HTTP \
--port 80 \
--default-actions Type=forward,TargetGroupArn=arn:aws:...aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name myapp-asg \
--launch-template LaunchTemplateId=lt-0abc123,Version='$Latest' \
--min-size 2 \
--max-size 10 \
--desired-capacity 2 \
--target-group-arns arn:aws:elasticloadbalancing:... \
--vpc-zone-identifier "subnet-abc,subnet-def" \
--health-check-type ELB \
--health-check-grace-period 300aws autoscaling put-scaling-policy \
--auto-scaling-group-name myapp-asg \
--policy-name target-tracking-cpu \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"TargetValue": 70.0
}'aws ec2 describe-images --owners selfaws ec2 describe-launch-templatesaws ec2 describe-launch-template-versions --launch-template-id lt-0abc123aws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names myapp-asgaws autoscaling describe-auto-scaling-groups \
--auto-scaling-group-names myapp-asg \
--query 'AutoScalingGroups[0].Instances[*].[InstanceId,LifecycleState,HealthStatus,AvailabilityZone]' \
--output tableaws autoscaling describe-scaling-activities \
--auto-scaling-group-name myapp-asg \
--max-records 10aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:...aws elbv2 describe-load-balancersaws autoscaling update-auto-scaling-group \
--auto-scaling-group-name myapp-asg \
--min-size 3 \
--max-size 15 \
--desired-capacity 5aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name myapp-asg \
--launch-template LaunchTemplateId=lt-0abc123,Version='$Latest'aws autoscaling set-desired-capacity \
--auto-scaling-group-name myapp-asg \
--desired-capacity 8Deleting an Auto Scaling Group with the --force-delete flag will terminate all managed instances immediately. Ensure you have verified that no critical traffic is being served before proceeding.
aws autoscaling delete-auto-scaling-group \
--auto-scaling-group-name myapp-asg \
--force-deleteaws elbv2 delete-load-balancer \
--load-balancer-arn arn:aws:elasticloadbalancing:...aws elbv2 delete-target-group \
--target-group-arn arn:aws:elasticloadbalancing:...aws ec2 delete-launch-template \
--launch-template-id lt-0abc123aws ec2 deregister-image --image-id ami-0abc123Architecture
Complete Infrastructure Overview
Automatic Scaling Lifecycle
Health Check Verification Flow
Best Practices
Security
Never expose EC2 instances directly to the internet. All inbound traffic must route through the Load Balancer, with instance security groups configured to accept connections exclusively from the Load Balancer's security group.
- Instances behind the Load Balancer: Configure security groups so instances only accept traffic from the Load Balancer, never directly from the internet
- IAM Instance Profiles: Assign IAM roles to instances rather than hardcoding credentials under any circumstance
- Secrets Management: Leverage AWS Secrets Manager or Parameter Store for sensitive variables — never embed them in user data
- Layered Security Groups: Maintain separate security groups for the Load Balancer, which accepts internet traffic, and for instances, which accept only Load Balancer traffic
- Secure Key Pairs: Rotate key pairs on a regular cadence and prefer AWS Systems Manager Session Manager over direct SSH access
Cost Optimization
- Target Tracking over Step Scaling: Target Tracking policies scale more efficiently, provisioning only what is genuinely required
- Appropriate minimum size: Avoid setting
min-sizehigher than operational necessity demands - Scheduled Scaling: For workloads with predictable traffic patterns, define schedules to scale capacity up and down at known intervals
- Appropriate instance types: Select free-tier-eligible instance types when applicable — t3.micro in sa-east-1, for example
- Spot Instances: Consider blending On-Demand with Spot Instances for workloads tolerant of interruption
Performance
- Multi-AZ deployment: Distribute instances across a minimum of two Availability Zones to ensure resilience
- Health check interval: A 30-second interval strikes an effective balance between responsiveness and resource consumption
- Sufficient grace period: Allow at least 5 minutes for applications with non-trivial startup sequences
- Connection draining: Ensure in-progress requests complete gracefully before the ASG terminates an instance
- Optimized AMIs: Pre-install all dependencies and application code in the AMI to minimize instance startup duration
Reliability
The /health endpoint should verify all critical downstream dependencies — database connectivity, cache availability, and any essential external services. A superficial health check that merely returns 200 OK without validation provides a dangerously false sense of confidence.
- Desired capacity above one: Always maintain at least two instances to guarantee fault tolerance
- ELB health check type: ELB-based health checks are substantially more reliable than EC2 health checks for web-facing applications
- Unhealthy threshold: Configure a threshold of at least three consecutive failures to tolerate transient issues without premature instance termination
- Metrics monitoring: Establish CloudWatch alarms for anomalous scaling activity and unhealthy instance counts
Operational Excellence
- Versioned Launch Templates: Create new versions rather than modifying existing templates, preserving a complete audit trail
- Consistent tagging: Apply tags to identify ASG instances by environment, project, and ownership
- Centralized logging: Route application logs to CloudWatch Logs for streamlined troubleshooting across the fleet
- Documented user data: Comment user data scripts thoroughly to support future maintenance
- AMI validation: Test AMIs in isolation before deploying them to production Auto Scaling Groups
Common Mistakes
Cost Considerations
Resource Pricing Overview
| Resource | Approximate Cost | Notes |
|---|---|---|
| Running EC2 instances | ~$0.0104/hour for t3.micro in sa-east-1 | Multiplied by the number of active instances |
| Application Load Balancer | ~0.008/LCU-hour | LCU is determined by the highest of: new connections, active connections, processed bytes, rule evaluations |
| Data Transfer OUT | ~$0.15/GB | First 1 GB free per month |
| Data Transfer between AZs | ~$0.01/GB | Within the same region only |
| Stored AMIs | ~$0.05/GB-month | Associated EBS snapshots |
| Auto Scaling service | FREE | You pay only for the resources it provisions |
Cost Optimization Strategies
Scale with precision using Target Tracking:
--target-tracking-configuration TargetValue=70.0Implement Scheduled Scaling for predictable workloads:
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name myapp-asg \
--scheduled-action-name scale-down-night \
--recurrence "0 22 * * *" \
--min-size 1 --desired-capacity 1Eliminate unused resources aggressively: Deregister old AMIs and delete their associated snapshots. Tear down Load Balancers created for testing. Terminate development ASGs that are no longer actively used.
Leverage Reserved Instances or Savings Plans: If you maintain a predictable base load — for instance, a minimum of two instances running continuously — Reserved Instances can yield savings of up to 72%.
Free Tier Limits
| Tier | Allowance | Key Constraint |
|---|---|---|
| EC2 Free Tier | 750 hours/month of t2.micro in us-east-1 or t3.micro in other regions | Two t3.micro instances running 24/7 consume 1,440 hours/month, exceeding the free tier. One instance running 24/7 remains within the allowance. |
| ELB Free Tier | Classic Load Balancer: 750 hours/month + 15 GB processed | Application Load Balancers are not included in the free tier and always generate cost |
For learning and experimentation, limit yourself to one or two t3.micro instances within the free tier allowance. Delete all resources immediately after practice sessions. Consider testing without a Load Balancer — using direct instance IPs — to avoid ALB charges entirely.
Integration with Other Services
| Service | Integration Mechanism | Typical Use Case |
|---|---|---|
| Route53 | DNS records pointing to the Load Balancer | Map www.myapp.com to the ALB DNS name with automatic failover |
| CloudWatch | Collects instance and ASG metrics | Monitor CPU and memory utilization, create alarms, centralize logs |
| IAM | Instance Profiles assigned via Launch Template | Grant instances permissions to access S3, DynamoDB, Secrets Manager |
| VPC | ASG launches instances in designated subnets | Multi-AZ deployment for high availability |
| RDS | Instances connect to a shared database | Stateless applications reading from and writing to a centralized relational database |
| ElastiCache | Instances leverage a shared cache layer | Session storage, frequently queried data caching |
| S3 | Instances upload and download objects | User uploads, application logs, static asset storage |
| Secrets Manager | User data retrieves secrets at startup | Database passwords and API keys without hardcoding |
| CloudFormation | Defines entire infrastructure as code | Deploy a complete stack — VPC, ASG, Load Balancer, RDS — with a single template |
| CodeDeploy | Automated deployments to ASG instances | CI/CD pipelines: push code, CodeDeploy updates instances without downtime |
| SNS | ASG publishes event notifications | Alerts when instances are created, terminated, or when scaling events occur |
| Lambda | Triggers on ASG lifecycle events | Execute custom logic when an instance launches — register in an external service, for example |
Additional Resources
Official Documentation
Relevant Whitepapers
- AWS Well-Architected Framework - Reliability Pillar
- Overview of Amazon Web Services - Auto Scaling
- Best Practices for Amazon EC2