Amazon S3 Essentials

S3 essentials, storage classes, lifecycle policies, security, and best practices

@geomenaSun Jul 20 20251,026 views

Amazon S3 is AWS's object storage service that lets you store and retrieve any amount of data from anywhere. It works as a highly scalable distributed file system where each file (object) is identified by a unique key within a container (bucket).

The problem it solves: Eliminates the need to manage physical storage infrastructure, providing infinitely scalable storage with 99.999999999% (11 9's) durability, multiple cost-performance tiers, and complete automation of data lifecycle.

When to use it: Static file storage (images, videos, PDFs), data lakes, backups, logs, static website hosting, CDN origin (CloudFront), long-term archival for compliance.

Alternatives: EFS (shared file system between EC2), EBS (block storage for EC2), standalone Glacier (archival only), on-premise storage (expensive, not scalable).

Key Concepts

ConceptDescription
BucketTop-level container with globally unique name. Belongs to a specific region. Buckets can't be nested
ObjectIndividual file identified by key (full path). Size 0 bytes - 5TB. Includes data + metadata + version ID
KeyUnique identifier of the object within the bucket (e.g., articles/2025/image.jpg). Simulates folder structure but is flat
Storage ClassStorage tier that defines cost, performance, and access characteristics (Standard, IA, Glacier, etc.)
Lifecycle PolicyAutomatic rules that move or delete objects based on age or versioning state
VersioningMaintains multiple variants of the same object. Protects against accidental deletions/overwrites
Bucket PolicyResource-based access control attached to the bucket. Defines who can do what in the bucket
IAM PolicyIdentity-based access control attached to users/roles. Defines what an identity can do
Pre-signed URLTemporary URL with specific permissions to access/upload objects without AWS credentials
Multipart UploadUploads large files in parallel parts. Improves performance and resilience. Recommended for files >100MB
Event NotificationAutomatic triggers when events occur in S3 (ObjectCreated, ObjectDeleted). Integrates with Lambda/SQS/SNS
Cross-Region Replication (CRR)Automatically replicates objects to bucket in another region. Requires versioning
Same-Region Replication (SRR)Replicates within the same region. Useful for copies in separate accounts
Replication Time Control (RTC)15-minute SLA for replication. Additional 25% cost
Object LockWORM (write-once-read-many) protection against modification/deletion. Governance/Compliance modes
MFA DeleteRequires multi-factor authentication to delete objects/versions. Prevents accidental deletions
S3 Transfer AccelerationAccelerates uploads/downloads using CloudFront's global network. Additional $0.04/GB cost

Essential AWS CLI Commands

Creating Resources

# Create bucket (specific region)
aws s3api create-bucket \
    --bucket my-bucket-name \
    --region sa-east-1 \
    --create-bucket-configuration LocationConstraint=sa-east-1

# Upload simple file
aws s3 cp file.txt s3://my-bucket/path/file.txt

# Upload with custom metadata
aws s3 cp image.jpg s3://my-bucket/images/image.jpg \
    --metadata author=user123,uploaded=2025-11-01 \
    --content-type image/jpeg

# Upload entire directory (recursive)
aws s3 cp ./my-directory s3://my-bucket/backup/ --recursive

# Configure lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
    --bucket my-bucket \
    --lifecycle-configuration file://lifecycle.json

# Enable versioning
aws s3api put-bucket-versioning \
    --bucket my-bucket \
    --versioning-configuration Status=Enabled

# Configure bucket policy (public access)
aws s3api put-bucket-policy \
    --bucket my-images-bucket \
    --policy file://public-read-policy.json

# Configure event notifications
aws s3api put-bucket-notification-configuration \
    --bucket my-bucket \
    --notification-configuration file://events.json

# Configure cross-region replication
aws s3api put-bucket-replication \
    --bucket source-bucket \
    --replication-configuration file://replication.json

Querying Resources

# List buckets
aws s3 ls

# List objects in bucket
aws s3 ls s3://my-bucket/ --recursive

# View details of specific object
aws s3api head-object \
    --bucket my-bucket \
    --key path/file.txt

# View all versions of an object
aws s3api list-object-versions \
    --bucket my-bucket \
    --prefix path/file.txt

# View lifecycle configuration
aws s3api get-bucket-lifecycle-configuration \
    --bucket my-bucket

# View bucket policy
aws s3api get-bucket-policy \
    --bucket my-bucket

# View versioning configuration
aws s3api get-bucket-versioning \
    --bucket my-bucket

# View replication status
aws s3api get-bucket-replication \
    --bucket my-bucket

Modifying Resources

# Change object storage class
aws s3api copy-object \
    --bucket my-bucket \
    --copy-source my-bucket/file.txt \
    --key file.txt \
    --storage-class GLACIER

# Generate pre-signed URL (valid 1 hour)
aws s3 presign s3://my-bucket/file.txt --expires-in 3600

# Update metadata without re-uploading file
aws s3api copy-object \
    --bucket my-bucket \
    --copy-source my-bucket/file.txt \
    --key file.txt \
    --metadata-directive REPLACE \
    --metadata newkey=newvalue

# Suspend versioning (doesn't delete existing versions)
aws s3api put-bucket-versioning \
    --bucket my-bucket \
    --versioning-configuration Status=Suspended

Deleting Resources

# Delete object
aws s3 rm s3://my-bucket/file.txt

# Delete specific version
aws s3api delete-object \
    --bucket my-bucket \
    --key file.txt \
    --version-id abc123

# Delete all objects in bucket (recursive)
aws s3 rm s3://my-bucket --recursive

# Delete empty bucket
aws s3api delete-bucket \
    --bucket my-bucket \
    --region sa-east-1

# Delete lifecycle configuration
aws s3api delete-bucket-lifecycle \
    --bucket my-bucket

# Delete bucket policy
aws s3api delete-bucket-policy \
    --bucket my-bucket

# Abort incomplete multipart uploads
aws s3api abort-multipart-upload \
    --bucket my-bucket \
    --key file.txt \
    --upload-id xyz789

Architecture and Flows

Typical Architecture Diagram

Lifecycle Transitions Flow

Event Notifications Flow

Best Practices Checklist

Security

  • Block Public Access enabled by default: Never disable unless you explicitly need a public bucket
  • Encryption at rest: Use SSE-S3 (free) as minimum, SSE-KMS for sensitive data with key rotation
  • Encryption in transit: Always use HTTPS, enforce with bucket policy denying aws:SecureTransport = false
  • Restrictive bucket policies: Principle of least privilege, use conditions (IP, VPC endpoint, time)
  • Versioning + MFA Delete: For critical buckets (backups, compliance), prevents malicious deletions
  • Object Lock in Compliance mode: For strict regulations (financial, healthcare), immutable even by root
  • Pre-signed URLs with short TTL: Maximum 1 hour for critical operations, 5-15 min for one-time downloads
  • IAM Roles instead of Access Keys: For applications on EC2/Lambda, never hardcode credentials
  • CloudTrail data events enabled: Audit all access to sensitive buckets
  • S3 Access Analyzer: Regularly review unintended cross-account permissions

Cost Optimization

  • Lifecycle policies implemented: Move objects to cheaper tiers automatically based on access pattern
  • Intelligent-Tiering only when needed: Evaluate if monitoring cost ($0.0025/1000 objects) is worth it
  • Cleanup incomplete multipart uploads: Lifecycle rule to abort after 7 days, avoids hidden storage charges
  • Controlled versioning: NoncurrentVersionExpiration to delete old versions, don't accumulate infinitely
  • Delete unnecessary objects: Expiration policies for temporary data (logs >X days, staging files)
  • CloudFront for static content: Reduces GET requests to S3, saves on data transfer OUT
  • S3 Select/Glacier Select: For queries on archived data, only pay for scanned data not full retrieval
  • Compression before upload: Gzip/Brotli significantly reduces storage and transfer costs
  • Requester Pays for public datasets: Transfer download cost to data consumers
  • Storage Lens for analytics: Identify buckets without lifecycle, duplicated objects, incomplete uploads

Performance

  • Multipart upload for large files (over 100MB): Parallelization improves throughput, resilience to network failures
  • S3 Transfer Acceleration: For global uploads from locations far from the bucket's region
  • CloudFront Origin Shield: Additional cache layer between CloudFront and S3, reduces origin load
  • Byte-range fetches: For partial downloads (e.g., video streaming), don't download entire file
  • Request rate distribution: Avoid hot keys, distribute requests with random prefixes if over 3500 PUT/5500 GET per second
  • VPC Endpoint for S3: Traffic stays in AWS private network, lower latency, no NAT Gateway charges
  • Cross-Region Replication with RTC: For critical cases requiring RPO under 15 minutes

Reliability

  • Cross-Region Replication for DR: Critical data automatically replicated to secondary region
  • Versioning enabled: Recovery from accidental deletions/corruptions
  • Multi-AZ by default: S3 Standard automatically replicates across 3 AZs, no configuration needed
  • Object Lock for compliance: Immutable data, ransomware protection
  • AWS Backup for critical buckets: Centralized backup cross-account/cross-region with unified policies
  • Monitoring with CloudWatch: Alarms on 4xx/5xx errors, replication lag, incomplete multipart uploads
  • Lifecycle testing in non-prod: Validate transitions before applying in production

Operational Excellence

  • Consistent tagging strategy: Environment, Project, Owner, CostCenter for all buckets
  • Clear naming convention: {company}-{service}-{environment}-{region} e.g., acme-logs-prod-useast1
  • Infrastructure as Code: Terraform/CloudFormation for all buckets, no manual configuration
  • Automated bucket policy testing: Validate permissions with IAM Access Analyzer in CI/CD
  • Data retention documentation: Document why each lifecycle policy exists and when to review
  • Alerting on anomalies: CloudWatch alarms on unexpected storage growth, request spikes
  • Regular access reviews: Quarterly review who has access to which buckets

Common Mistakes to Avoid

Using incorrect Storage Classes for the access pattern

Why it happens: Choosing Standard-IA or Glacier because it's "cheaper" without considering retrieval costs.

The real problem: A file in Standard-IA downloaded 1000 times/month can cost 10x more than if it were in Standard due to the $0.01/GB retrieval fee.

How to avoid it:

  • Standard: frequent access (>1 time/week)
  • Standard-IA: infrequent access but you need instant retrieval (1-2 times/month)
  • Glacier: rare access, you can wait minutes/hours
  • Deep Archive: compliance, almost never accessed, you can wait days

Not configuring lifecycle for incomplete multipart uploads

Why it happens: Failed multipart uploads leave parts in S3 that are charged but not visible in normal listings.

Impact: Silent accumulation of unused GB, growing costs without obvious explanation.

How to avoid it:

{
  "Rules": [{
    "Id": "cleanup-incomplete-uploads",
    "Status": "Enabled",
    "Filter": {"Prefix": ""},
    "AbortIncompleteMultipartUpload": {
      "DaysAfterInitiation": 7
    }
  }]
}

Lifecycle transitions that violate minimum storage duration

Why it happens: Configuring Standard (day 0) → Glacier (day 30) → Deep Archive (day 60) without knowing AWS rules.

The error: AWS requires minimum 90 days in Glacier before you can move to Deep Archive. If you try earlier, you get InvalidArgument error.

How to avoid it: Respect minimum storage durations:

  • Standard-IA: 30 days
  • Glacier Flexible: 90 days
  • Deep Archive: 180 days

If you transition to Glacier on day 7, you can move to Deep Archive on day 97 (7 + 90) at the earliest.

Bucket policies that allow unintended public access

Why it happens: Copying bucket policy from the internet without understanding "Principal": "*" with "Effect": "Allow".

Risk: Data leak, compliance violation, huge bill from data transfer if someone scrapes the bucket.

How to avoid it:

  • Keep Block Public Access enabled by default
  • If you need public: use CloudFront + Origin Access Identity, never direct public
  • Use aws:SecureTransport condition to enforce HTTPS
  • Regularly review with S3 Access Analyzer

Not enabling versioning on critical buckets

Why it happens: "I don't need it, we have backups" or "it consumes too much storage".

The problem: Accidental deletion (user, code bug, ransomware) and nightly backups don't include that recent file.

How to avoid it:

  • Enable versioning on ALL production buckets
  • Use lifecycle to expire old versions (NoncurrentVersionExpiration: 90 days)
  • Combine with MFA Delete for additional protection

Hardcoding credentials in code to access S3

Why it happens: "It's easier" or unfamiliarity with IAM Roles.

Risk: Credentials in Git, compromised, rotation nightmare.

How to avoid it:

  • EC2: Use IAM Instance Profile
  • Lambda: Execution Role with specific permissions
  • External applications: IAM User with rotated Access Keys, or better: AssumeRole cross-account

Forgetting to configure Cross-Region Replication before disaster

Why it happens: "We'll do it when needed" but disaster doesn't give notice.

The problem: CRR only replicates NEW objects after enabling replication. Existing objects require S3 Batch Operations.

How to avoid it:

  • Configure CRR from day 1 on critical buckets
  • Test DR by executing failover to secondary region semi-annually
  • Document RTO/RPO and validate that CRR meets them

Using a single bucket for everything

Why it happens: "It's simpler to manage" or lack of architecture.

The problem: Huge blast radius (one bug affects all data), complex access policies, costs hard to track.

How to avoid it: Separate buckets by:

  • Environment (prod/staging/dev)
  • Data type (logs/backups/user-uploads)
  • Compliance level (PII vs non-PII)
  • Access patterns (public vs private)

Cost Considerations

What Generates Costs

CategoryCostNotes
Storage - Standard$0.023/GB-monthFirst 50TB
Storage - Standard-IA$0.0125/GB-month+ min 128KB, min 30 days
Storage - Glacier Instant$0.004/GB-month+ min 128KB, min 90 days
Storage - Glacier Flexible$0.0036/GB-month+ min 90 days
Storage - Deep Archive$0.00099/GB-month+ min 180 days
PUT/COPY/POST/LIST requests$0.005/1,000 requests
GET/SELECT requests$0.0004/1,000 requests
DELETE/CANCEL requestsFREE
Retrieval - Standard-IA$0.01/GB
Retrieval - Glacier Expedited$0.03/GB1-5 min
Retrieval - Glacier Standard$0.01/GB3-5 hours
Retrieval - Glacier Bulk$0.0025/GB5-12 hours
Retrieval - Deep Archive$0.02/GB12-48 hours
Data Transfer OUT$0.09/GBFirst 100GB/month FREE
Replication$0.02/GB
Transfer Acceleration+$0.04/GBAdditional

Real Calculation Example

Scenario: 1TB of monthly logs, 3-year retention

Months 1-3 (Standard for debugging):
  100GB × $0.023 × 3 months = $6.90

Months 4-12 (Glacier for compliance):
  900GB × $0.0036 × 9 months = $29.16

Years 2-3 (Deep Archive):
  1200GB × $0.00099 × 24 months = $28.51

TOTAL 3 years: $64.57

vs ALL in Standard: $828
Savings: 92%!

Free Tier (first 12 months)

  • Storage: 5GB Standard storage
  • PUT requests: 2,000 requests
  • GET requests: 20,000 requests
  • Data transfer: 15GB OUT per month (aggregated across all AWS services)

IMPORTANT: Free tier does NOT include:

  • Standard-IA, Glacier, or Deep Archive storage
  • Application Load Balancer (750 hours CLB yes)
  • Transfer Acceleration
  • Replication

Strategic Optimization

1. CloudFront as cache layer

Without CloudFront: 1M GET requests/month × $0.0004 = $400
With CloudFront: 900K cache hits + 100K misses = $40
Savings: 90%

2. Compression

1TB uncompressed: $23/month in Standard
1TB → 300GB with Gzip: $6.90/month
Savings: 70%

3. Aggressive lifecycle policies

Logs without lifecycle: 12TB annually × $0.023 = $276/month
With optimized lifecycle: $25/month
Savings: 91%

4. S3 Select instead of full retrieval

Glacier: 100GB file, you need 1GB of data
Full retrieval: 100GB × $0.01 = $1
S3 Select: 1GB × $0.002 = $0.002
Savings: 99.8%

Integration with Other Services

ServiceHow it integratesTypical use case
EC2Instance Profile for access, user data downloads scripts from S3Instances read/write logs, backups, configuration files
LambdaExecution Role with S3 permissions, S3 Events triggersProcessing uploaded files (resize images, transcode videos)
CloudFrontOrigin for content distribution, OAI for private accessCDN for static assets, video streaming, website hosting
Route53Alias record points to bucket configured as websiteStatic hosting with custom domain (www.example.com)
RDS/DynamoDBAutomatic backups to S3, snapshot exportsDisaster recovery, data lakes, historical analysis
CloudWatchApplication logs exported to S3, S3 metrics in CloudWatchLog analysis with Athena, storage anomaly alerts
CloudTrailAPI call logs saved in S3Security auditing, compliance, forensics
IAMPolicies to control access, roles for servicesLeast privilege, temporary credentials, cross-account access
KMSEncryption keys for SSE-KMSSensitive data with automatic key rotation, usage audit trail
AthenaSQL queries on data in S3Log analysis, data lakes, BI on CSV/JSON/Parquet files
GlueETL jobs read/write in S3Data pipeline, transformations, data lake cataloging
SageMakerTraining data and model artifacts in S3Machine learning workflows, model versioning
Kinesis FirehoseStreaming data delivery to S3Real-time logs, streaming analytics, IoT data ingestion
SNS/SQSS3 Events send notificationsEvent-driven architectures, processing decoupling
Step FunctionsOrchestration of workflows involving S3Complex pipelines (upload → validate → process → notify)
EventBridgeS3 Events as input for complex rulesConditional routing, multiple targets, scheduled actions
AWS BackupCentralized backup of S3 bucketsCompliance, cross-account/region backups, retention policies
DataSyncOn-premise data migration to S3Mass transfer, continuous synchronization
Storage GatewayBridge between on-premise and S3Hybrid cloud, gradual migration, tape replacement
MacieData sensitivity analysis in S3Detect PII, compliance (GDPR, HIPAA), automatic classification

Additional Resources

Official AWS Documentation

Whitepapers and Best Practices

Hands-On Tutorials

For AWS Solutions Architect Associate Certification