Amazon EC2: A Comprehensive Operations and Architecture Reference
EC2 instance management, CLI operations, security hardening, cost governance, and architectural best practices
Every application requires compute capacity, and the decision of how to provision it defines the operational trajectory of your infrastructure. Amazon EC2 — Elastic Compute Cloud — is the foundational compute service within AWS, enabling you to launch, configure, and manage virtual servers with granular control over the operating system, networking, and storage layers. When your workloads demand persistent resources, deterministic performance characteristics, or full administrative access to the underlying operating system, EC2 is the appropriate choice. For ephemeral, event-driven workloads, AWS Lambda offers a serverless alternative. For containerized applications without infrastructure management overhead, ECS with Fargate is the more suitable path. For simplified deployments with preconfigured environments, Lightsail provides a streamlined option.
Key Concepts
| Concept | Description |
|---|---|
| Instance Types | Hardware families defining CPU, RAM, storage, and networking capacity. The naming convention follows the pattern [family][generation].[size] — for example, t3.medium or m5.xlarge. |
| Key Pairs | SSH key pairs used for secure authentication. AWS retains the public key while you maintain custody of the private key. |
| Security Groups | Instance-level virtual firewalls governing inbound and outbound traffic through rules defined by protocol, port, and CIDR range. |
| User Data Scripts | Bash scripts executed automatically during the first boot of an instance, typically used for software installation and initial configuration. |
| EBS | Elastic Block Store — persistent storage volumes that attach to EC2 instances as virtual block devices. These volumes exist independently of the instance lifecycle. |
| Elastic IP | A static public IPv4 address that persists across instance stop/start cycles. AWS charges for allocated Elastic IPs that are not associated with a running instance. |
| CPU Credits | The burst performance model for T-family instances. Credits accumulate during periods of low utilization and are consumed when CPU demand exceeds the baseline threshold. |
| AMI | Amazon Machine Image — a template containing the operating system, installed applications, and configuration state used to launch new instances. |
Essential CLI Commands
aws ec2 create-key-pair \
--key-name my-keypair \
--query 'KeyMaterial' \
--output text > my-keypair.pemchmod 400 my-keypair.pemaws ec2 create-security-group \
--group-name web-server-sg \
--description "Allow HTTP and SSH" \
--vpc-id vpc-xxxxxxaws ec2 authorize-security-group-ingress \
--group-id sg-xxxxx \
--protocol tcp \
--port 22 \
--cidr 157.100.121.171/32aws ec2 run-instances \
--image-id ami-0da00c97ce64145f1 \
--instance-type t3.micro \
--key-name my-keypair \
--security-group-ids sg-xxxxx \
--user-data file://userdata.sh \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=WebServer-01}]'aws ec2 allocate-addressaws sts get-caller-identityaws ec2 describe-vpcs \
--query 'Vpcs[?IsDefault==`true`].VpcId' \
--output textaws ec2 describe-images \
--owners amazon \
--filters "Name=name,Values=al2023-ami-2023*" \
"Name=architecture,Values=x86_64" \
--query 'sort_by(Images, &CreationDate)[-1].ImageId' \
--output textaws ec2 describe-instances \
--instance-ids i-xxxxx \
--query 'Reservations[0].Instances[0].[State.Name,PublicIpAddress]' \
--output tableaws ec2 describe-instance-status \
--instance-ids i-xxxxxaws ec2 describe-security-groups \
--group-ids sg-xxxxxaws ec2 describe-volumes \
--volume-ids vol-xxxxxaws ec2 get-console-output \
--instance-id i-xxxxx \
--output text > console-output.txtaws ec2 describe-addressesaws ec2 modify-instance-attribute \
--instance-id i-xxxxx \
--instance-type t3.smallaws ec2 modify-instance-attribute \
--instance-id i-xxxxx \
--block-device-mappings DeviceName=/dev/xvda,Ebs={DeleteOnTermination=false}aws ec2 terminate-instances \
--instance-ids i-xxxxxaws ec2 delete-security-group \
--group-id sg-xxxxxaws ec2 release-address \
--allocation-id eipalloc-xxxxxaws ec2 delete-key-pair \
--key-name my-keypairArchitecture and Flows
EC2 Instance Components
Instance Launch Flow
EBS Lifecycle
Best Practices
Security
Never expose port 22 to 0.0.0.0/0. Restrict SSH access to specific IP addresses or, preferably, eliminate the need for open inbound ports entirely by using AWS Systems Manager Session Manager.
- Apply the principle of least privilege when defining security group rules
- Rotate key pairs on a regular cadence and store private keys in a secure vault
- Enable encryption on all EBS volumes by setting
Encrypted: true - Assign IAM roles to instances rather than embedding credentials directly
- Ensure the operating system remains current by including
yum update -yin user data scripts
Cost Optimization
- Set
DeleteOnTermination: truefor volumes attached to ephemeral or disposable instances - Release any Elastic IP addresses that are not actively associated with a running instance
- Monitor the
CPUCreditBalancemetric on T-family instances — if credits are consistently depleted, migrate to an M-family instance type - Begin with the smallest viable instance size and scale vertically based on observed CloudWatch metrics
- Evaluate Reserved Instances or Savings Plans for workloads with predictable, sustained utilization patterns
- Terminate development and staging instances outside of business hours
Performance
- Select instance types based on the actual bottleneck — whether CPU, memory, or I/O throughput
- Prefer gp3 volumes over gp2 for superior cost-to-performance ratios
- For I/O-intensive workloads, evaluate Instance Store volumes or io2 EBS volumes
- Launch instances in the Availability Zone nearest to your end users
- Enable detailed monitoring when you require metrics at one-minute granularity
Reliability
Never deploy a single EC2 instance as the sole compute layer in production. Design for redundancy across multiple Availability Zones from the outset.
- Implement health checks at both the infrastructure and application layers
- Automate deployments through user data scripts or dedicated configuration management tooling
- Create AMIs from fully configured instances to enable rapid recovery
- Configure automated EBS snapshots for all volumes containing critical data
Operational Excellence
Adopt a consistent tagging strategy across all resources. At minimum, apply Name, Environment, Project, and Owner tags to every instance and its associated resources.
- Document within user data scripts exactly what is installed and the rationale behind each configuration decision
- Centralize logging through CloudWatch Logs
- Implement Infrastructure as Code using CloudFormation or Terraform
- Maintain an up-to-date inventory of all instances and their designated purposes
Common Mistakes
Cost Considerations
Cost Components
| Component | Approximate Cost | Billing Unit |
|---|---|---|
| Running instance | Varies by instance type | Per hour |
| EBS volumes | 0.10 per GB | Per GB-month |
| Unassociated Elastic IP | $0.005 per hour | Per hour |
| Data transfer out | $0.09 per GB beyond 100 GB | Per GB |
| EBS snapshots | ~$0.05 per GB | Per GB-month |
Free Tier Allowances — First 12 Months
| Resource | Monthly Allocation |
|---|---|
| Compute | 750 hours of t2.micro or t3.micro on Linux |
| Storage | 30 GB of EBS General Purpose — gp2 or gp3 |
| Snapshots | 1 GB |
Optimization Strategies
| Strategy | Expected Savings | Best Suited For |
|---|---|---|
| Right-sizing | Variable — depends on over-provisioning | All workloads; start with t3.micro and scale based on CloudWatch metrics |
| Stop vs. Terminate | Eliminates instance-hour charges while retaining EBS | Development and test environments not in active use |
| Spot Instances | 70–90% discount | Fault-tolerant, interruptible workloads |
| Reserved Instances / Savings Plans | Up to 72% discount | Predictable, sustained production workloads |
aws ce get-cost-and-usage \
--time-period Start=2025-10-01,End=2025-10-10 \
--granularity DAILY \
--metrics UnblendedCost \
--group-by Type=DIMENSION,Key=INSTANCE_TYPEIntegration with Other Services
| Service | Integration Mechanism | Typical Use Case |
|---|---|---|
| VPC | EC2 instances reside within a VPC | Network isolation, subnet placement, and security group assignment |
| EBS | Block storage volumes attached to instances | Root volumes and persistent data storage |
| IAM | Instance profiles and IAM roles | Granting permissions to access S3, DynamoDB, and other AWS services |
| CloudWatch | Metrics collection and log aggregation | Monitoring CPU, memory, disk, and custom application metrics |
| Auto Scaling | Dynamic instance fleet management | Horizontal scaling based on demand thresholds |
| ELB | Traffic distribution across instances | Load balancing for high availability and fault tolerance |
| S3 | Object storage integration | User data scripts, backups, and static asset hosting |
| RDS | Managed relational database backend | Application servers on EC2 connecting to RDS database instances |
| Route 53 | DNS resolution and traffic routing | Mapping domain names to instance IPs or load balancer endpoints |
| Systems Manager | Agentless remote management | Session Manager as an SSH alternative, automated patching |