Amazon S3 Essentials
S3 essentials, storage classes, lifecycle policies, security, and best practices
Amazon S3 is AWS's object storage service that lets you store and retrieve any amount of data from anywhere. It works as a highly scalable distributed file system where each file (object) is identified by a unique key within a container (bucket).
The problem it solves: Eliminates the need to manage physical storage infrastructure, providing infinitely scalable storage with 99.999999999% (11 9's) durability, multiple cost-performance tiers, and complete automation of data lifecycle.
When to use it: Static file storage (images, videos, PDFs), data lakes, backups, logs, static website hosting, CDN origin (CloudFront), long-term archival for compliance.
Alternatives: EFS (shared file system between EC2), EBS (block storage for EC2), standalone Glacier (archival only), on-premise storage (expensive, not scalable).
Key Concepts
| Concept | Description |
|---|---|
| Bucket | Top-level container with globally unique name. Belongs to a specific region. Buckets can't be nested |
| Object | Individual file identified by key (full path). Size 0 bytes - 5TB. Includes data + metadata + version ID |
| Key | Unique identifier of the object within the bucket (e.g., articles/2025/image.jpg). Simulates folder structure but is flat |
| Storage Class | Storage tier that defines cost, performance, and access characteristics (Standard, IA, Glacier, etc.) |
| Lifecycle Policy | Automatic rules that move or delete objects based on age or versioning state |
| Versioning | Maintains multiple variants of the same object. Protects against accidental deletions/overwrites |
| Bucket Policy | Resource-based access control attached to the bucket. Defines who can do what in the bucket |
| IAM Policy | Identity-based access control attached to users/roles. Defines what an identity can do |
| Pre-signed URL | Temporary URL with specific permissions to access/upload objects without AWS credentials |
| Multipart Upload | Uploads large files in parallel parts. Improves performance and resilience. Recommended for files >100MB |
| Event Notification | Automatic triggers when events occur in S3 (ObjectCreated, ObjectDeleted). Integrates with Lambda/SQS/SNS |
| Cross-Region Replication (CRR) | Automatically replicates objects to bucket in another region. Requires versioning |
| Same-Region Replication (SRR) | Replicates within the same region. Useful for copies in separate accounts |
| Replication Time Control (RTC) | 15-minute SLA for replication. Additional 25% cost |
| Object Lock | WORM (write-once-read-many) protection against modification/deletion. Governance/Compliance modes |
| MFA Delete | Requires multi-factor authentication to delete objects/versions. Prevents accidental deletions |
| S3 Transfer Acceleration | Accelerates uploads/downloads using CloudFront's global network. Additional $0.04/GB cost |
Essential AWS CLI Commands
Creating Resources
# Create bucket (specific region)
aws s3api create-bucket \
--bucket my-bucket-name \
--region sa-east-1 \
--create-bucket-configuration LocationConstraint=sa-east-1
# Upload simple file
aws s3 cp file.txt s3://my-bucket/path/file.txt
# Upload with custom metadata
aws s3 cp image.jpg s3://my-bucket/images/image.jpg \
--metadata author=user123,uploaded=2025-11-01 \
--content-type image/jpeg
# Upload entire directory (recursive)
aws s3 cp ./my-directory s3://my-bucket/backup/ --recursive
# Configure lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
--bucket my-bucket \
--lifecycle-configuration file://lifecycle.json
# Enable versioning
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=Enabled
# Configure bucket policy (public access)
aws s3api put-bucket-policy \
--bucket my-images-bucket \
--policy file://public-read-policy.json
# Configure event notifications
aws s3api put-bucket-notification-configuration \
--bucket my-bucket \
--notification-configuration file://events.json
# Configure cross-region replication
aws s3api put-bucket-replication \
--bucket source-bucket \
--replication-configuration file://replication.jsonQuerying Resources
# List buckets
aws s3 ls
# List objects in bucket
aws s3 ls s3://my-bucket/ --recursive
# View details of specific object
aws s3api head-object \
--bucket my-bucket \
--key path/file.txt
# View all versions of an object
aws s3api list-object-versions \
--bucket my-bucket \
--prefix path/file.txt
# View lifecycle configuration
aws s3api get-bucket-lifecycle-configuration \
--bucket my-bucket
# View bucket policy
aws s3api get-bucket-policy \
--bucket my-bucket
# View versioning configuration
aws s3api get-bucket-versioning \
--bucket my-bucket
# View replication status
aws s3api get-bucket-replication \
--bucket my-bucketModifying Resources
# Change object storage class
aws s3api copy-object \
--bucket my-bucket \
--copy-source my-bucket/file.txt \
--key file.txt \
--storage-class GLACIER
# Generate pre-signed URL (valid 1 hour)
aws s3 presign s3://my-bucket/file.txt --expires-in 3600
# Update metadata without re-uploading file
aws s3api copy-object \
--bucket my-bucket \
--copy-source my-bucket/file.txt \
--key file.txt \
--metadata-directive REPLACE \
--metadata newkey=newvalue
# Suspend versioning (doesn't delete existing versions)
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=SuspendedDeleting Resources
# Delete object
aws s3 rm s3://my-bucket/file.txt
# Delete specific version
aws s3api delete-object \
--bucket my-bucket \
--key file.txt \
--version-id abc123
# Delete all objects in bucket (recursive)
aws s3 rm s3://my-bucket --recursive
# Delete empty bucket
aws s3api delete-bucket \
--bucket my-bucket \
--region sa-east-1
# Delete lifecycle configuration
aws s3api delete-bucket-lifecycle \
--bucket my-bucket
# Delete bucket policy
aws s3api delete-bucket-policy \
--bucket my-bucket
# Abort incomplete multipart uploads
aws s3api abort-multipart-upload \
--bucket my-bucket \
--key file.txt \
--upload-id xyz789Architecture and Flows
Typical Architecture Diagram
Lifecycle Transitions Flow
Event Notifications Flow
Best Practices Checklist
Security
- Block Public Access enabled by default: Never disable unless you explicitly need a public bucket
- Encryption at rest: Use SSE-S3 (free) as minimum, SSE-KMS for sensitive data with key rotation
- Encryption in transit: Always use HTTPS, enforce with bucket policy denying
aws:SecureTransport = false - Restrictive bucket policies: Principle of least privilege, use conditions (IP, VPC endpoint, time)
- Versioning + MFA Delete: For critical buckets (backups, compliance), prevents malicious deletions
- Object Lock in Compliance mode: For strict regulations (financial, healthcare), immutable even by root
- Pre-signed URLs with short TTL: Maximum 1 hour for critical operations, 5-15 min for one-time downloads
- IAM Roles instead of Access Keys: For applications on EC2/Lambda, never hardcode credentials
- CloudTrail data events enabled: Audit all access to sensitive buckets
- S3 Access Analyzer: Regularly review unintended cross-account permissions
Cost Optimization
- Lifecycle policies implemented: Move objects to cheaper tiers automatically based on access pattern
- Intelligent-Tiering only when needed: Evaluate if monitoring cost ($0.0025/1000 objects) is worth it
- Cleanup incomplete multipart uploads: Lifecycle rule to abort after 7 days, avoids hidden storage charges
- Controlled versioning: NoncurrentVersionExpiration to delete old versions, don't accumulate infinitely
- Delete unnecessary objects: Expiration policies for temporary data (logs >X days, staging files)
- CloudFront for static content: Reduces GET requests to S3, saves on data transfer OUT
- S3 Select/Glacier Select: For queries on archived data, only pay for scanned data not full retrieval
- Compression before upload: Gzip/Brotli significantly reduces storage and transfer costs
- Requester Pays for public datasets: Transfer download cost to data consumers
- Storage Lens for analytics: Identify buckets without lifecycle, duplicated objects, incomplete uploads
Performance
- Multipart upload for large files (over 100MB): Parallelization improves throughput, resilience to network failures
- S3 Transfer Acceleration: For global uploads from locations far from the bucket's region
- CloudFront Origin Shield: Additional cache layer between CloudFront and S3, reduces origin load
- Byte-range fetches: For partial downloads (e.g., video streaming), don't download entire file
- Request rate distribution: Avoid hot keys, distribute requests with random prefixes if over 3500 PUT/5500 GET per second
- VPC Endpoint for S3: Traffic stays in AWS private network, lower latency, no NAT Gateway charges
- Cross-Region Replication with RTC: For critical cases requiring RPO under 15 minutes
Reliability
- Cross-Region Replication for DR: Critical data automatically replicated to secondary region
- Versioning enabled: Recovery from accidental deletions/corruptions
- Multi-AZ by default: S3 Standard automatically replicates across 3 AZs, no configuration needed
- Object Lock for compliance: Immutable data, ransomware protection
- AWS Backup for critical buckets: Centralized backup cross-account/cross-region with unified policies
- Monitoring with CloudWatch: Alarms on 4xx/5xx errors, replication lag, incomplete multipart uploads
- Lifecycle testing in non-prod: Validate transitions before applying in production
Operational Excellence
- Consistent tagging strategy: Environment, Project, Owner, CostCenter for all buckets
- Clear naming convention:
{company}-{service}-{environment}-{region}e.g.,acme-logs-prod-useast1 - Infrastructure as Code: Terraform/CloudFormation for all buckets, no manual configuration
- Automated bucket policy testing: Validate permissions with IAM Access Analyzer in CI/CD
- Data retention documentation: Document why each lifecycle policy exists and when to review
- Alerting on anomalies: CloudWatch alarms on unexpected storage growth, request spikes
- Regular access reviews: Quarterly review who has access to which buckets
Common Mistakes to Avoid
Using incorrect Storage Classes for the access pattern
Why it happens: Choosing Standard-IA or Glacier because it's "cheaper" without considering retrieval costs.
The real problem: A file in Standard-IA downloaded 1000 times/month can cost 10x more than if it were in Standard due to the $0.01/GB retrieval fee.
How to avoid it:
- Standard: frequent access (>1 time/week)
- Standard-IA: infrequent access but you need instant retrieval (1-2 times/month)
- Glacier: rare access, you can wait minutes/hours
- Deep Archive: compliance, almost never accessed, you can wait days
Not configuring lifecycle for incomplete multipart uploads
Why it happens: Failed multipart uploads leave parts in S3 that are charged but not visible in normal listings.
Impact: Silent accumulation of unused GB, growing costs without obvious explanation.
How to avoid it:
{
"Rules": [{
"Id": "cleanup-incomplete-uploads",
"Status": "Enabled",
"Filter": {"Prefix": ""},
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
}]
}Lifecycle transitions that violate minimum storage duration
Why it happens: Configuring Standard (day 0) → Glacier (day 30) → Deep Archive (day 60) without knowing AWS rules.
The error: AWS requires minimum 90 days in Glacier before you can move to Deep Archive. If you try earlier, you get InvalidArgument error.
How to avoid it: Respect minimum storage durations:
- Standard-IA: 30 days
- Glacier Flexible: 90 days
- Deep Archive: 180 days
If you transition to Glacier on day 7, you can move to Deep Archive on day 97 (7 + 90) at the earliest.
Bucket policies that allow unintended public access
Why it happens: Copying bucket policy from the internet without understanding "Principal": "*" with "Effect": "Allow".
Risk: Data leak, compliance violation, huge bill from data transfer if someone scrapes the bucket.
How to avoid it:
- Keep Block Public Access enabled by default
- If you need public: use CloudFront + Origin Access Identity, never direct public
- Use
aws:SecureTransportcondition to enforce HTTPS - Regularly review with S3 Access Analyzer
Not enabling versioning on critical buckets
Why it happens: "I don't need it, we have backups" or "it consumes too much storage".
The problem: Accidental deletion (user, code bug, ransomware) and nightly backups don't include that recent file.
How to avoid it:
- Enable versioning on ALL production buckets
- Use lifecycle to expire old versions (NoncurrentVersionExpiration: 90 days)
- Combine with MFA Delete for additional protection
Hardcoding credentials in code to access S3
Why it happens: "It's easier" or unfamiliarity with IAM Roles.
Risk: Credentials in Git, compromised, rotation nightmare.
How to avoid it:
- EC2: Use IAM Instance Profile
- Lambda: Execution Role with specific permissions
- External applications: IAM User with rotated Access Keys, or better: AssumeRole cross-account
Forgetting to configure Cross-Region Replication before disaster
Why it happens: "We'll do it when needed" but disaster doesn't give notice.
The problem: CRR only replicates NEW objects after enabling replication. Existing objects require S3 Batch Operations.
How to avoid it:
- Configure CRR from day 1 on critical buckets
- Test DR by executing failover to secondary region semi-annually
- Document RTO/RPO and validate that CRR meets them
Using a single bucket for everything
Why it happens: "It's simpler to manage" or lack of architecture.
The problem: Huge blast radius (one bug affects all data), complex access policies, costs hard to track.
How to avoid it: Separate buckets by:
- Environment (prod/staging/dev)
- Data type (logs/backups/user-uploads)
- Compliance level (PII vs non-PII)
- Access patterns (public vs private)
Cost Considerations
What Generates Costs
| Category | Cost | Notes |
|---|---|---|
| Storage - Standard | $0.023/GB-month | First 50TB |
| Storage - Standard-IA | $0.0125/GB-month | + min 128KB, min 30 days |
| Storage - Glacier Instant | $0.004/GB-month | + min 128KB, min 90 days |
| Storage - Glacier Flexible | $0.0036/GB-month | + min 90 days |
| Storage - Deep Archive | $0.00099/GB-month | + min 180 days |
| PUT/COPY/POST/LIST requests | $0.005/1,000 requests | |
| GET/SELECT requests | $0.0004/1,000 requests | |
| DELETE/CANCEL requests | FREE | |
| Retrieval - Standard-IA | $0.01/GB | |
| Retrieval - Glacier Expedited | $0.03/GB | 1-5 min |
| Retrieval - Glacier Standard | $0.01/GB | 3-5 hours |
| Retrieval - Glacier Bulk | $0.0025/GB | 5-12 hours |
| Retrieval - Deep Archive | $0.02/GB | 12-48 hours |
| Data Transfer OUT | $0.09/GB | First 100GB/month FREE |
| Replication | $0.02/GB | |
| Transfer Acceleration | +$0.04/GB | Additional |
Real Calculation Example
Scenario: 1TB of monthly logs, 3-year retention
Months 1-3 (Standard for debugging):
100GB × $0.023 × 3 months = $6.90
Months 4-12 (Glacier for compliance):
900GB × $0.0036 × 9 months = $29.16
Years 2-3 (Deep Archive):
1200GB × $0.00099 × 24 months = $28.51
TOTAL 3 years: $64.57
vs ALL in Standard: $828
Savings: 92%!Free Tier (first 12 months)
- Storage: 5GB Standard storage
- PUT requests: 2,000 requests
- GET requests: 20,000 requests
- Data transfer: 15GB OUT per month (aggregated across all AWS services)
IMPORTANT: Free tier does NOT include:
- Standard-IA, Glacier, or Deep Archive storage
- Application Load Balancer (750 hours CLB yes)
- Transfer Acceleration
- Replication
Strategic Optimization
1. CloudFront as cache layer
Without CloudFront: 1M GET requests/month × $0.0004 = $400
With CloudFront: 900K cache hits + 100K misses = $40
Savings: 90%2. Compression
1TB uncompressed: $23/month in Standard
1TB → 300GB with Gzip: $6.90/month
Savings: 70%3. Aggressive lifecycle policies
Logs without lifecycle: 12TB annually × $0.023 = $276/month
With optimized lifecycle: $25/month
Savings: 91%4. S3 Select instead of full retrieval
Glacier: 100GB file, you need 1GB of data
Full retrieval: 100GB × $0.01 = $1
S3 Select: 1GB × $0.002 = $0.002
Savings: 99.8%Integration with Other Services
| Service | How it integrates | Typical use case |
|---|---|---|
| EC2 | Instance Profile for access, user data downloads scripts from S3 | Instances read/write logs, backups, configuration files |
| Lambda | Execution Role with S3 permissions, S3 Events triggers | Processing uploaded files (resize images, transcode videos) |
| CloudFront | Origin for content distribution, OAI for private access | CDN for static assets, video streaming, website hosting |
| Route53 | Alias record points to bucket configured as website | Static hosting with custom domain (www.example.com) |
| RDS/DynamoDB | Automatic backups to S3, snapshot exports | Disaster recovery, data lakes, historical analysis |
| CloudWatch | Application logs exported to S3, S3 metrics in CloudWatch | Log analysis with Athena, storage anomaly alerts |
| CloudTrail | API call logs saved in S3 | Security auditing, compliance, forensics |
| IAM | Policies to control access, roles for services | Least privilege, temporary credentials, cross-account access |
| KMS | Encryption keys for SSE-KMS | Sensitive data with automatic key rotation, usage audit trail |
| Athena | SQL queries on data in S3 | Log analysis, data lakes, BI on CSV/JSON/Parquet files |
| Glue | ETL jobs read/write in S3 | Data pipeline, transformations, data lake cataloging |
| SageMaker | Training data and model artifacts in S3 | Machine learning workflows, model versioning |
| Kinesis Firehose | Streaming data delivery to S3 | Real-time logs, streaming analytics, IoT data ingestion |
| SNS/SQS | S3 Events send notifications | Event-driven architectures, processing decoupling |
| Step Functions | Orchestration of workflows involving S3 | Complex pipelines (upload → validate → process → notify) |
| EventBridge | S3 Events as input for complex rules | Conditional routing, multiple targets, scheduled actions |
| AWS Backup | Centralized backup of S3 buckets | Compliance, cross-account/region backups, retention policies |
| DataSync | On-premise data migration to S3 | Mass transfer, continuous synchronization |
| Storage Gateway | Bridge between on-premise and S3 | Hybrid cloud, gradual migration, tape replacement |
| Macie | Data sensitivity analysis in S3 | Detect PII, compliance (GDPR, HIPAA), automatic classification |
Additional Resources
Official AWS Documentation
Whitepapers and Best Practices
- AWS Well-Architected Framework - S3
- S3 Best Practices Design Patterns
- S3 Security Best Practices
- Cost Optimization for S3