Amazon S3 Essentials | Geovanni Mena

S3 essentials, storage classes, lifecycle policies, security, and best practices

Key Concepts Essential AWS CLI Commands Creating Resources Querying Resources Modifying Resources Deleting Resources Architecture and Flows Typical Architecture Diagram Lifecycle Transitions Flow Event Notifications Flow Best Practices Checklist Security Cost Optimization Performance Reliability Operational Excellence Common Mistakes to Avoid Using incorrect Storage Classes for the access pattern Not configuring lifecycle for incomplete multipart uploads Lifecycle transitions that violate minimum storage duration Bucket policies that allow unintended public access Not enabling versioning on critical buckets Hardcoding credentials in code to access S3 Forgetting to configure Cross-Region Replication before disaster Using a single bucket for everything Cost Considerations What Generates Costs Real Calculation Example Free Tier (first 12 months)Strategic Optimization Integration with Other Services Additional Resources Official AWS Documentation Whitepapers and Best Practices Hands-On Tutorials For AWS Solutions Architect Associate Certification

Amazon S3 is AWS's object storage service that lets you store and retrieve any amount of data from anywhere. It works as a highly scalable distributed file system where each file (object) is identified by a unique key within a container (bucket).

The problem it solves: Eliminates the need to manage physical storage infrastructure, providing infinitely scalable storage with 99.999999999% (11 9's) durability, multiple cost-performance tiers, and complete automation of data lifecycle.

When to use it: Static file storage (images, videos, PDFs), data lakes, backups, logs, static website hosting, CDN origin (CloudFront), long-term archival for compliance.

Alternatives: EFS (shared file system between EC2), EBS (block storage for EC2), standalone Glacier (archival only), on-premise storage (expensive, not scalable).

Key Concepts

Concept	Description
Bucket	Top-level container with globally unique name. Belongs to a specific region. Buckets can't be nested
Object	Individual file identified by key (full path). Size 0 bytes - 5TB. Includes data + metadata + version ID
Key	Unique identifier of the object within the bucket (e.g., `articles/2025/image.jpg`). Simulates folder structure but is flat
Storage Class	Storage tier that defines cost, performance, and access characteristics (Standard, IA, Glacier, etc.)
Lifecycle Policy	Automatic rules that move or delete objects based on age or versioning state
Versioning	Maintains multiple variants of the same object. Protects against accidental deletions/overwrites
Bucket Policy	Resource-based access control attached to the bucket. Defines who can do what in the bucket
IAM Policy	Identity-based access control attached to users/roles. Defines what an identity can do
Pre-signed URL	Temporary URL with specific permissions to access/upload objects without AWS credentials
Multipart Upload	Uploads large files in parallel parts. Improves performance and resilience. Recommended for files >100MB
Event Notification	Automatic triggers when events occur in S3 (ObjectCreated, ObjectDeleted). Integrates with Lambda/SQS/SNS
Cross-Region Replication (CRR)	Automatically replicates objects to bucket in another region. Requires versioning
Same-Region Replication (SRR)	Replicates within the same region. Useful for copies in separate accounts
Replication Time Control (RTC)	15-minute SLA for replication. Additional 25% cost
Object Lock	WORM (write-once-read-many) protection against modification/deletion. Governance/Compliance modes
MFA Delete	Requires multi-factor authentication to delete objects/versions. Prevents accidental deletions
S3 Transfer Acceleration	Accelerates uploads/downloads using CloudFront's global network. Additional $0.04/GB cost

Essential AWS CLI Commands

Creating Resources

# Create bucket (specific region)
aws s3api create-bucket \
    --bucket my-bucket-name \
    --region sa-east-1 \
    --create-bucket-configuration LocationConstraint=sa-east-1

# Upload simple file
aws s3 cp file.txt s3://my-bucket/path/file.txt

# Upload with custom metadata
aws s3 cp image.jpg s3://my-bucket/images/image.jpg \
    --metadata author=user123,uploaded=2025-11-01 \
    --content-type image/jpeg

# Upload entire directory (recursive)
aws s3 cp ./my-directory s3://my-bucket/backup/ --recursive

# Configure lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
    --bucket my-bucket \
    --lifecycle-configuration file://lifecycle.json

# Enable versioning
aws s3api put-bucket-versioning \
    --bucket my-bucket \
    --versioning-configuration Status=Enabled

# Configure bucket policy (public access)
aws s3api put-bucket-policy \
    --bucket my-images-bucket \
    --policy file://public-read-policy.json

# Configure event notifications
aws s3api put-bucket-notification-configuration \
    --bucket my-bucket \
    --notification-configuration file://events.json

# Configure cross-region replication
aws s3api put-bucket-replication \
    --bucket source-bucket \
    --replication-configuration file://replication.json

Querying Resources

# List buckets
aws s3 ls

# List objects in bucket
aws s3 ls s3://my-bucket/ --recursive

# View details of specific object
aws s3api head-object \
    --bucket my-bucket \
    --key path/file.txt

# View all versions of an object
aws s3api list-object-versions \
    --bucket my-bucket \
    --prefix path/file.txt

# View lifecycle configuration
aws s3api get-bucket-lifecycle-configuration \
    --bucket my-bucket

# View bucket policy
aws s3api get-bucket-policy \
    --bucket my-bucket

# View versioning configuration
aws s3api get-bucket-versioning \
    --bucket my-bucket

# View replication status
aws s3api get-bucket-replication \
    --bucket my-bucket

Modifying Resources

# Change object storage class
aws s3api copy-object \
    --bucket my-bucket \
    --copy-source my-bucket/file.txt \
    --key file.txt \
    --storage-class GLACIER

# Generate pre-signed URL (valid 1 hour)
aws s3 presign s3://my-bucket/file.txt --expires-in 3600

# Update metadata without re-uploading file
aws s3api copy-object \
    --bucket my-bucket \
    --copy-source my-bucket/file.txt \
    --key file.txt \
    --metadata-directive REPLACE \
    --metadata newkey=newvalue

# Suspend versioning (doesn't delete existing versions)
aws s3api put-bucket-versioning \
    --bucket my-bucket \
    --versioning-configuration Status=Suspended

Deleting Resources

# Delete object
aws s3 rm s3://my-bucket/file.txt

# Delete specific version
aws s3api delete-object \
    --bucket my-bucket \
    --key file.txt \
    --version-id abc123

# Delete all objects in bucket (recursive)
aws s3 rm s3://my-bucket --recursive

# Delete empty bucket
aws s3api delete-bucket \
    --bucket my-bucket \
    --region sa-east-1

# Delete lifecycle configuration
aws s3api delete-bucket-lifecycle \
    --bucket my-bucket

# Delete bucket policy
aws s3api delete-bucket-policy \
    --bucket my-bucket

# Abort incomplete multipart uploads
aws s3api abort-multipart-upload \
    --bucket my-bucket \
    --key file.txt \
    --upload-id xyz789

Architecture and Flows

Typical Architecture Diagram

Lifecycle Transitions Flow

Event Notifications Flow

Best Practices Checklist

Security

Block Public Access enabled by default: Never disable unless you explicitly need a public bucket
Encryption at rest: Use SSE-S3 (free) as minimum, SSE-KMS for sensitive data with key rotation
Encryption in transit: Always use HTTPS, enforce with bucket policy denying aws:SecureTransport = false
Restrictive bucket policies: Principle of least privilege, use conditions (IP, VPC endpoint, time)
Versioning + MFA Delete: For critical buckets (backups, compliance), prevents malicious deletions
Object Lock in Compliance mode: For strict regulations (financial, healthcare), immutable even by root
Pre-signed URLs with short TTL: Maximum 1 hour for critical operations, 5-15 min for one-time downloads
IAM Roles instead of Access Keys: For applications on EC2/Lambda, never hardcode credentials
CloudTrail data events enabled: Audit all access to sensitive buckets
S3 Access Analyzer: Regularly review unintended cross-account permissions

Cost Optimization

Lifecycle policies implemented: Move objects to cheaper tiers automatically based on access pattern
Intelligent-Tiering only when needed: Evaluate if monitoring cost ($0.0025/1000 objects) is worth it
Cleanup incomplete multipart uploads: Lifecycle rule to abort after 7 days, avoids hidden storage charges
Controlled versioning: NoncurrentVersionExpiration to delete old versions, don't accumulate infinitely
Delete unnecessary objects: Expiration policies for temporary data (logs >X days, staging files)
CloudFront for static content: Reduces GET requests to S3, saves on data transfer OUT
S3 Select/Glacier Select: For queries on archived data, only pay for scanned data not full retrieval
Compression before upload: Gzip/Brotli significantly reduces storage and transfer costs
Requester Pays for public datasets: Transfer download cost to data consumers
Storage Lens for analytics: Identify buckets without lifecycle, duplicated objects, incomplete uploads

Performance

Multipart upload for large files (over 100MB): Parallelization improves throughput, resilience to network failures
S3 Transfer Acceleration: For global uploads from locations far from the bucket's region
CloudFront Origin Shield: Additional cache layer between CloudFront and S3, reduces origin load
Byte-range fetches: For partial downloads (e.g., video streaming), don't download entire file
Request rate distribution: Avoid hot keys, distribute requests with random prefixes if over 3500 PUT/5500 GET per second
VPC Endpoint for S3: Traffic stays in AWS private network, lower latency, no NAT Gateway charges
Cross-Region Replication with RTC: For critical cases requiring RPO under 15 minutes

Reliability

Cross-Region Replication for DR: Critical data automatically replicated to secondary region
Versioning enabled: Recovery from accidental deletions/corruptions
Multi-AZ by default: S3 Standard automatically replicates across 3 AZs, no configuration needed
Object Lock for compliance: Immutable data, ransomware protection
AWS Backup for critical buckets: Centralized backup cross-account/cross-region with unified policies
Monitoring with CloudWatch: Alarms on 4xx/5xx errors, replication lag, incomplete multipart uploads
Lifecycle testing in non-prod: Validate transitions before applying in production

Operational Excellence

Consistent tagging strategy: Environment, Project, Owner, CostCenter for all buckets
Clear naming convention: {company}-{service}-{environment}-{region} e.g., acme-logs-prod-useast1
Infrastructure as Code: Terraform/CloudFormation for all buckets, no manual configuration
Automated bucket policy testing: Validate permissions with IAM Access Analyzer in CI/CD
Data retention documentation: Document why each lifecycle policy exists and when to review
Alerting on anomalies: CloudWatch alarms on unexpected storage growth, request spikes
Regular access reviews: Quarterly review who has access to which buckets

Common Mistakes to Avoid

Using incorrect Storage Classes for the access pattern

Why it happens: Choosing Standard-IA or Glacier because it's "cheaper" without considering retrieval costs.

The real problem: A file in Standard-IA downloaded 1000 times/month can cost 10x more than if it were in Standard due to the $0.01/GB retrieval fee.

How to avoid it:

Standard: frequent access (>1 time/week)
Standard-IA: infrequent access but you need instant retrieval (1-2 times/month)
Glacier: rare access, you can wait minutes/hours
Deep Archive: compliance, almost never accessed, you can wait days

Not configuring lifecycle for incomplete multipart uploads

Why it happens: Failed multipart uploads leave parts in S3 that are charged but not visible in normal listings.

Impact: Silent accumulation of unused GB, growing costs without obvious explanation.

How to avoid it:

{
  "Rules": [{
    "Id": "cleanup-incomplete-uploads",
    "Status": "Enabled",
    "Filter": {"Prefix": ""},
    "AbortIncompleteMultipartUpload": {
      "DaysAfterInitiation": 7
    }
  }]
}

Lifecycle transitions that violate minimum storage duration

Why it happens: Configuring Standard (day 0) → Glacier (day 30) → Deep Archive (day 60) without knowing AWS rules.

The error: AWS requires minimum 90 days in Glacier before you can move to Deep Archive. If you try earlier, you get InvalidArgument error.

How to avoid it: Respect minimum storage durations:

Standard-IA: 30 days
Glacier Flexible: 90 days
Deep Archive: 180 days

If you transition to Glacier on day 7, you can move to Deep Archive on day 97 (7 + 90) at the earliest.

Bucket policies that allow unintended public access

Why it happens: Copying bucket policy from the internet without understanding "Principal": "*" with "Effect": "Allow".

Risk: Data leak, compliance violation, huge bill from data transfer if someone scrapes the bucket.

How to avoid it:

Keep Block Public Access enabled by default
If you need public: use CloudFront + Origin Access Identity, never direct public
Use aws:SecureTransport condition to enforce HTTPS
Regularly review with S3 Access Analyzer

Not enabling versioning on critical buckets

Why it happens: "I don't need it, we have backups" or "it consumes too much storage".

The problem: Accidental deletion (user, code bug, ransomware) and nightly backups don't include that recent file.

How to avoid it:

Enable versioning on ALL production buckets
Use lifecycle to expire old versions (NoncurrentVersionExpiration: 90 days)
Combine with MFA Delete for additional protection

Hardcoding credentials in code to access S3

Why it happens: "It's easier" or unfamiliarity with IAM Roles.

Risk: Credentials in Git, compromised, rotation nightmare.

How to avoid it:

EC2: Use IAM Instance Profile
Lambda: Execution Role with specific permissions
External applications: IAM User with rotated Access Keys, or better: AssumeRole cross-account

Forgetting to configure Cross-Region Replication before disaster

Why it happens: "We'll do it when needed" but disaster doesn't give notice.

The problem: CRR only replicates NEW objects after enabling replication. Existing objects require S3 Batch Operations.

How to avoid it:

Configure CRR from day 1 on critical buckets
Test DR by executing failover to secondary region semi-annually
Document RTO/RPO and validate that CRR meets them

Using a single bucket for everything

Why it happens: "It's simpler to manage" or lack of architecture.

The problem: Huge blast radius (one bug affects all data), complex access policies, costs hard to track.

How to avoid it: Separate buckets by:

Environment (prod/staging/dev)
Data type (logs/backups/user-uploads)
Compliance level (PII vs non-PII)
Access patterns (public vs private)

Cost Considerations

What Generates Costs

Category	Cost	Notes
Storage - Standard	$0.023/GB-month	First 50TB
Storage - Standard-IA	$0.0125/GB-month	+ min 128KB, min 30 days
Storage - Glacier Instant	$0.004/GB-month	+ min 128KB, min 90 days
Storage - Glacier Flexible	$0.0036/GB-month	+ min 90 days
Storage - Deep Archive	$0.00099/GB-month	+ min 180 days
PUT/COPY/POST/LIST requests	$0.005/1,000 requests
GET/SELECT requests	$0.0004/1,000 requests
DELETE/CANCEL requests	FREE
Retrieval - Standard-IA	$0.01/GB
Retrieval - Glacier Expedited	$0.03/GB	1-5 min
Retrieval - Glacier Standard	$0.01/GB	3-5 hours
Retrieval - Glacier Bulk	$0.0025/GB	5-12 hours
Retrieval - Deep Archive	$0.02/GB	12-48 hours
Data Transfer OUT	$0.09/GB	First 100GB/month FREE
Replication	$0.02/GB
Transfer Acceleration	+$0.04/GB	Additional

Real Calculation Example

Scenario: 1TB of monthly logs, 3-year retention

Months 1-3 (Standard for debugging):
  100GB × $0.023 × 3 months = $6.90

Months 4-12 (Glacier for compliance):
  900GB × $0.0036 × 9 months = $29.16

Years 2-3 (Deep Archive):
  1200GB × $0.00099 × 24 months = $28.51

TOTAL 3 years: $64.57

vs ALL in Standard: $828
Savings: 92%!

Free Tier (first 12 months)

Storage: 5GB Standard storage
PUT requests: 2,000 requests
GET requests: 20,000 requests
Data transfer: 15GB OUT per month (aggregated across all AWS services)

IMPORTANT: Free tier does NOT include:

Standard-IA, Glacier, or Deep Archive storage
Application Load Balancer (750 hours CLB yes)
Transfer Acceleration
Replication

Strategic Optimization

1. CloudFront as cache layer

Without CloudFront: 1M GET requests/month × $0.0004 = $400
With CloudFront: 900K cache hits + 100K misses = $40
Savings: 90%

2. Compression

1TB uncompressed: $23/month in Standard
1TB → 300GB with Gzip: $6.90/month
Savings: 70%

3. Aggressive lifecycle policies

Logs without lifecycle: 12TB annually × $0.023 = $276/month
With optimized lifecycle: $25/month
Savings: 91%

4. S3 Select instead of full retrieval

Glacier: 100GB file, you need 1GB of data
Full retrieval: 100GB × $0.01 = $1
S3 Select: 1GB × $0.002 = $0.002
Savings: 99.8%

Integration with Other Services

Service	How it integrates	Typical use case
EC2	Instance Profile for access, user data downloads scripts from S3	Instances read/write logs, backups, configuration files
Lambda	Execution Role with S3 permissions, S3 Events triggers	Processing uploaded files (resize images, transcode videos)
CloudFront	Origin for content distribution, OAI for private access	CDN for static assets, video streaming, website hosting
Route53	Alias record points to bucket configured as website	Static hosting with custom domain (www.example.com)
RDS/DynamoDB	Automatic backups to S3, snapshot exports	Disaster recovery, data lakes, historical analysis
CloudWatch	Application logs exported to S3, S3 metrics in CloudWatch	Log analysis with Athena, storage anomaly alerts
CloudTrail	API call logs saved in S3	Security auditing, compliance, forensics
IAM	Policies to control access, roles for services	Least privilege, temporary credentials, cross-account access
KMS	Encryption keys for SSE-KMS	Sensitive data with automatic key rotation, usage audit trail
Athena	SQL queries on data in S3	Log analysis, data lakes, BI on CSV/JSON/Parquet files
Glue	ETL jobs read/write in S3	Data pipeline, transformations, data lake cataloging
SageMaker	Training data and model artifacts in S3	Machine learning workflows, model versioning
Kinesis Firehose	Streaming data delivery to S3	Real-time logs, streaming analytics, IoT data ingestion
SNS/SQS	S3 Events send notifications	Event-driven architectures, processing decoupling
Step Functions	Orchestration of workflows involving S3	Complex pipelines (upload → validate → process → notify)
EventBridge	S3 Events as input for complex rules	Conditional routing, multiple targets, scheduled actions
AWS Backup	Centralized backup of S3 buckets	Compliance, cross-account/region backups, retention policies
DataSync	On-premise data migration to S3	Mass transfer, continuous synchronization
Storage Gateway	Bridge between on-premise and S3	Hybrid cloud, gradual migration, tape replacement
Macie	Data sensitivity analysis in S3	Detect PII, compliance (GDPR, HIPAA), automatic classification