Amazon RDS Essentials

RDS essentials, storage types, backups, Multi-AZ, Read Replicas, and best practices

@geomenaSat Aug 02 20251,008 views

Amazon RDS (Relational Database Service) is a managed relational database service that automates administrative tasks like provisioning, patching, backups, recovery, and scaling. It supports multiple engines: PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, and Amazon Aurora.

The problem it solves: It eliminates the operational burden of managing databases - no more manual installation, backup configuration, infrastructure monitoring, OS/DB patching, or failover management. You focus on your schema and queries, AWS handles everything else.

When to use it: For any application requiring a relational database in production. RDS vs EC2+manual DB = 10-20 hours/month saved on administration + automatic backups + point-in-time recovery included. Alternatives: Aurora (more AWS-native features, 5x performance), self-managed on EC2 (only if you need very specific configurations not supported), DynamoDB (if you prefer NoSQL).

Key Concepts

ConceptDescription
DB InstanceManaged database server with compute, storage, and networking. It's the main RDS component - similar to an EC2 specialized for databases
Instance ClassCompute type for your DB (CPU, RAM, network). Families: T (burstable, dev/test), M (general purpose, balanced production), R (memory optimized, analytics), X (extra memory, in-memory workloads)
Storage TypeDisk type for persistence. gp3 (General Purpose SSD) for 90% of cases, io2 (Provisioned IOPS) for I/O-intensive workloads with strict SLA, Magnetic (legacy, avoid)
IOPSInput/Output Operations Per Second - disk performance measure. gp3 baseline = 3,000 IOPS, configurable up to 16,000. io2 up to 256,000 IOPS
Allocated StorageDisk space assigned to the DB. Minimum 20 GB (gp3), maximum 64 TB. Can grow automatically with storage autoscaling
Automated BackupDaily full snapshot + continuous transaction logs every 5 minutes. Enables point-in-time recovery within the retention period (1-35 days)
Backup Retention PeriodDays RDS retains automated backups. Default 7 days, maximum 35 days. Backups free up to DB size, then $0.095/GB-month
Point-in-Time Recovery (PITR)Ability to restore DB to any specific second within the retention period. Useful for recovery from human errors (DROP TABLE, DELETE without WHERE)
Manual SnapshotExplicit backup you create. Persists indefinitely (until you delete it), survives if you delete the DB instance. You pay $0.095/GB-month for storage
DB EndpointDNS hostname to connect to the DB (e.g., mydb.abc123.sa-east-1.rds.amazonaws.com). Doesn't change during instance lifetime (except for rename)
Master Username/PasswordDB administrator user credentials. Configurable at creation, modifiable afterward via CLI/Console
DB Subnet GroupSet of subnets where RDS can launch DB instances. Minimum 2 subnets in different AZs (for Multi-AZ support)
Security GroupFirewall controlling which IPs/security groups can connect to your DB. Typically only allow connections from your app servers (EC2/ECS/Lambda)
Multi-AZ DeploymentHigh availability configuration with automatic standby replica in another AZ. Automatic failover in ~1-2 minutes on hardware/AZ failure
Read ReplicaRead-only copy of your DB to scale reads. Asynchronous replication, can be in same region or cross-region. Useful for reports/analytics without impacting production
Maintenance WindowWeekly time window where AWS can apply patches/updates. Configurable, recommended during low traffic hours
Engine VersionSpecific DB engine version (e.g., PostgreSQL 15.4). AWS handles minor version upgrades automatically, major versions require manual upgrade

Essential AWS CLI Commands

Creating and Querying DB Instances

# Create DB Instance (PostgreSQL)
aws rds create-db-instance \
    --db-instance-identifier myapp-db \
    --db-instance-class db.t3.micro \
    --engine postgres \
    --engine-version 15.4 \
    --master-username admin \
    --master-user-password SecurePassword123! \
    --allocated-storage 20 \
    --storage-type gp3 \
    --backup-retention-period 7 \
    --preferred-backup-window "03:00-04:00" \
    --vpc-security-group-ids sg-0abc123 \
    --db-subnet-group-name my-db-subnet-group \
    --no-publicly-accessible \
    --storage-encrypted \
    --tags Key=Environment,Value=Production

# List all DB instances
aws rds describe-db-instances

# View specific instance details
aws rds describe-db-instances \
    --db-instance-identifier myapp-db \
    --query 'DBInstances[0].{
        Status:DBInstanceStatus,
        Endpoint:Endpoint.Address,
        Engine:Engine,
        Class:DBInstanceClass,
        Storage:AllocatedStorage,
        IOPS:Iops
    }'

# Verify if available
aws rds describe-db-instances \
    --db-instance-identifier myapp-db \
    --query 'DBInstances[0].DBInstanceStatus' \
    --output text

# Get connection endpoint
aws rds describe-db-instances \
    --db-instance-identifier myapp-db \
    --query 'DBInstances[0].Endpoint.Address' \
    --output text

Modifying DB Instances

# Change instance class (upgrade/downgrade)
aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --db-instance-class db.t3.small \
    --apply-immediately

# Increase storage
aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --allocated-storage 50 \
    --apply-immediately

# Increase IOPS (gp3)
aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --iops 6000 \
    --apply-immediately

# Change storage type (gp3 to io2)
aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --storage-type io2 \
    --iops 10000 \
    --apply-immediately
# Warning: Causes downtime (~10-30 min)

# Change backup retention period
aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --backup-retention-period 14 \
    --apply-immediately

# Rename DB instance
aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --new-db-instance-identifier myapp-db-v2 \
    --apply-immediately

Deleting DB Instances

# Delete DB instance WITH final snapshot
aws rds delete-db-instance \
    --db-instance-identifier myapp-db \
    --final-db-snapshot-identifier myapp-db-final-snapshot

# Delete DB instance WITHOUT snapshot (not recommended in prod)
aws rds delete-db-instance \
    --db-instance-identifier myapp-db \
    --skip-final-snapshot

# Verify deletion
aws rds describe-db-instances \
    --db-instance-identifier myapp-db \
    --query 'DBInstances[0].DBInstanceStatus'

Manual Snapshots

# Create manual snapshot
aws rds create-db-snapshot \
    --db-instance-identifier myapp-db \
    --db-snapshot-identifier myapp-pre-migration-2025-11-30

# List snapshots
aws rds describe-db-snapshots \
    --db-instance-identifier myapp-db

# View manual snapshots only
aws rds describe-db-snapshots \
    --db-instance-identifier myapp-db \
    --snapshot-type manual

# View automated snapshots only
aws rds describe-db-snapshots \
    --db-instance-identifier myapp-db \
    --snapshot-type automated

# Delete manual snapshot
aws rds delete-db-snapshot \
    --db-snapshot-identifier myapp-pre-migration-2025-11-30

# Copy snapshot to another region (disaster recovery)
aws rds copy-db-snapshot \
    --source-db-snapshot-identifier arn:aws:rds:sa-east-1:123456789012:snapshot:myapp-snapshot \
    --target-db-snapshot-identifier myapp-snapshot-us-east-1 \
    --region us-east-1

Point-in-Time and Snapshot Restore

# Point-in-Time Restore (any minute within retention period)
aws rds restore-db-instance-to-point-in-time \
    --source-db-instance-identifier myapp-db \
    --target-db-instance-identifier myapp-db-restored \
    --restore-time "2025-11-30T15:06:00Z" \
    --vpc-security-group-ids sg-0abc123

# Restore from manual snapshot
aws rds restore-db-instance-from-db-snapshot \
    --db-instance-identifier myapp-db-from-snapshot \
    --db-snapshot-identifier myapp-pre-migration-2025-11-30 \
    --db-instance-class db.t3.micro

# View available restore points
aws rds describe-db-instances \
    --db-instance-identifier myapp-db \
    --query 'DBInstances[0].{
        EarliestRestorableTime:EarliestRestorableTime,
        LatestRestorableTime:LatestRestorableTime
    }'

CloudWatch Monitoring

# View CPU utilization (last 24 hours)
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name CPUUtilization \
    --dimensions Name=DBInstanceIdentifier,Value=myapp-db \
    --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 3600 \
    --statistics Average,Maximum

# View database connections
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name DatabaseConnections \
    --dimensions Name=DBInstanceIdentifier,Value=myapp-db \
    --period 300 \
    --statistics Average

# View Read/Write IOPS
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name ReadIOPS \
    --dimensions Name=DBInstanceIdentifier,Value=myapp-db \
    --period 300 \
    --statistics Average

# View disk queue depth (if high, you need more IOPS)
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name DiskQueueDepth \
    --dimensions Name=DBInstanceIdentifier,Value=myapp-db \
    --period 300 \
    --statistics Average

Logs Access

# List available log files
aws rds describe-db-log-files \
    --db-instance-identifier myapp-db

# Download complete log file
aws rds download-db-log-file-portion \
    --db-instance-identifier myapp-db \
    --log-file-name error/postgresql.log.2025-11-30-15 \
    --output text

# View last lines of log (tail)
aws rds download-db-log-file-portion \
    --db-instance-identifier myapp-db \
    --log-file-name error/postgresql.log \
    --starting-token 0 \
    --output text | tail -n 100

Architecture and Flows

Typical Architecture with RDS

Point-in-Time Recovery Flow

Storage Type Decision Tree

Multi-AZ Failover Flow

Best Practices Checklist

Security

  • Never use publicly-accessible in production: DB instances should be in private subnets, accessible only from app servers' security group
  • Encryption at rest enabled: --storage-encrypted at creation (can't be enabled afterward)
  • SSL/TLS for connections: Force SSL in DB parameters, use RDS certificates in app
  • Secrets Manager for credentials: Don't hardcode passwords, use automatic rotation
  • IAM Database Authentication: For applications that support it (eliminates need for passwords)
  • Restrictive security groups: Only necessary ports (5432 for PostgreSQL) from specific IPs/SGs
  • Audit logging enabled: Enable log_statement, log_connections for PostgreSQL
  • Encrypted snapshots: Manual snapshots inherit encryption from DB source

Cost Optimization

  • Right-sizing instance class: Monitor CPU/Memory in CloudWatch, downgrade if sustained utilization under 40%
  • gp3 over gp2: gp3 is cheaper and more flexible (3,000 IOPS baseline independent of storage size)
  • Storage autoscaling configured: Avoids running out of space, grows only when needed
  • Appropriate backup retention: 7 days for dev/test, 14-30 days for prod, not more than necessary
  • Delete old snapshots: Manual snapshots cost $0.095/GB-month indefinitely
  • Reserved Instances for stable production: 1-year RI = ~40% savings, 3-year = ~60% savings
  • Dev/test instances stopped off-hours: Use scheduled Lambda/EventBridge for stop/start
  • Read Replicas in same region when possible: Cross-region replicas have data transfer costs

Performance

  • CloudWatch alarms configured: CPU over 80%, FreeableMemory under 500MB, DiskQueueDepth over 5
  • Monitor IOPS utilization: If consistently over 80% baseline, increase provisioned IOPS
  • Connection pooling in application: PgBouncer/RDS Proxy to handle many connections efficiently
  • Appropriate indexes: EXPLAIN ANALYZE slow queries, add indexes where needed
  • Read Replicas for read-heavy workloads: Offload reports/analytics to replica
  • Optimized parameter group: Adjust shared_buffers, work_mem, effective_cache_size per workload
  • Upgrade to modern instances: M6i/R6i more performance per $ than previous generations

Reliability

  • Multi-AZ enabled in production: Automatic failover in ~1-2 min on hardware/AZ failure
  • Automated backups enabled: Minimum 7 days retention, NEVER backup-retention-period = 0
  • Manual snapshot before risky changes: Migrations, upgrades, major schema changes
  • Test restore process quarterly: Disaster recovery drill, measure recovery time
  • Maintenance window configured: Low traffic hours (early morning/weekend)
  • CloudWatch Events for alerts: Notify when failover occurs, backups fail, storage low
  • RDS Proxy for faster failover: Reduces connection storm post-failover

Operational Excellence

  • Consistent tagging: Environment, Application, Owner, CostCenter for tracking
  • Infrastructure as Code: Terraform/CloudFormation for DB instances, parameter groups, subnet groups
  • Snapshot before deleting: Final snapshot or verify backups before delete-db-instance
  • Naming conventions: {app}-{env}-{region} e.g., myapp-prod-sa-east-1
  • Enhanced Monitoring enabled: OS metrics (processes, threads, memory detail) every 60s
  • Performance Insights enabled: Query-level metrics, wait event analysis (first 7 days free)
  • Document custom configurations: Modified parameter groups, security group rules, etc.

Common Mistakes to Avoid

Backup Retention Period = 0

Why it happens: "I want to save costs, I'll disable automated backups."

The real problem: Automated backups up to your DB size are FREE. Disabling them saves $0 but leaves you unprotected against disasters.

Typical scenario:

# BAD: Disable backups
aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --backup-retention-period 0

# Savings: $0 (backups are free up to DB size)
# Risk: Total data loss if DROP TABLE, DELETE without WHERE, etc.

How to avoid it:

# GOOD: Minimum 7 days in production
aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --backup-retention-period 7

# For compliance: 14-30 days
--backup-retention-period 30

Golden rule: NEVER backup-retention-period = 0 in production. Only acceptable in temporary testing environments.

Choosing io2 Without Analyzing CloudWatch Metrics First

Why it happens: "More IOPS = better performance, let's go with io2."

The real problem: io2 is 10-15x more expensive than gp3. Most workloads work perfectly fine with gp3.

Before considering io2:

# 1. Check current IOPS utilization
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name ReadIOPS \
    --dimensions Name=DBInstanceIdentifier,Value=myapp-db \
    --period 300 \
    --statistics Average,Maximum

# 2. If Average under 2,500 IOPS -> gp3 baseline (3K) is sufficient
# 3. If Average 5,000-10,000 -> increase IOPS in gp3 first
# 4. Only if over 16,000 sustained IOPS -> consider io2

Cost comparison (100 GB, 10,000 IOPS):

gp3: $11.50 (storage) + $35 (7K IOPS extra) = $46.50/month
io2: $13.80 (storage) + $650 (10K IOPS) = $663.80/month

Difference: 14x more expensive

When to USE io2: Financial applications (trading), latency under 1ms critical, 99.99%+ SLA where every millisecond counts.

Not Creating Manual Snapshot Before Risky Changes

Why it happens: "Automated backups are enough."

The real problem: Automated backups are every 24 hours (typically 3 AM). If you do a migration at 10 AM and it fails, the last backup is from 7 hours ago.

Data loss scenario:

# Thursday 10 AM: Schema migration
ALTER TABLE users ADD COLUMN preferences JSONB;
# Migration corrupts data

# Last automated backup: Thursday 3 AM (7 hours ago)
# Point-in-time restore: You lose 7 hours of legitimate transactions

How to avoid it:

# ALWAYS before major changes:
aws rds create-db-snapshot \
    --db-instance-identifier myapp-db \
    --db-snapshot-identifier pre-migration-$(date +%Y%m%d-%H%M)

# Do risky change
# If it fails -> restore from manual snapshot (minutes lost, not hours)

# After confirming success:
aws rds delete-db-snapshot \
    --db-snapshot-identifier pre-migration-20251130-1000

When to create manual snapshots:

  • Schema migrations
  • Major version upgrades
  • Bulk data modifications
  • Deployment of critical changes

Publicly Accessible = True in Production

Why it happens: "I need to connect from my laptop for debugging."

The real problem: You expose your DB to the internet. Bots constantly scan for open DBs, attempting brute force.

# BAD: DB accessible from internet
aws rds create-db-instance \
    --publicly-accessible
    # Security group allows 0.0.0.0/0 -> DB exposed

# Result: You appear on Shodan, constant brute force attempts

How to avoid it:

# GOOD: DB in private subnet, not publicly accessible
aws rds create-db-instance \
    --no-publicly-accessible \
    --vpc-security-group-ids sg-private-db

# Security group allows ONLY from app servers:
aws ec2 authorize-security-group-ingress \
    --group-id sg-private-db \
    --protocol tcp \
    --port 5432 \
    --source-group sg-app-servers

# For debugging from laptop:
# Option 1: Bastion host in public subnet
# Option 2: VPN/Direct Connect
# Option 3: SSM Session Manager port forwarding

Not Monitoring FreeableMemory and SwapUsage

Why it happens: "CloudWatch shows CPU under 50%, must be fine."

The real problem: Normal CPU but saturated memory = slow queries, swapping to disk (100x slower than RAM).

Symptoms you're ignoring:

# CPU: 45% (seems OK)
# FreeableMemory: 50 MB (of 1 GB total) - PROBLEM
# SwapUsage: 500 MB - SWAP ACTIVE = SLOW
# Latency: 2000ms (should be under 50ms)

Correct diagnosis:

# View memory metrics
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name FreeableMemory \
    --dimensions Name=DBInstanceIdentifier,Value=myapp-db \
    --period 300 \
    --statistics Average

# If FreeableMemory consistently under 500 MB:
# -> You need instance class upgrade (more RAM)

aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --db-instance-class db.t3.small \
    --apply-immediately

Preventive alarm:

aws cloudwatch put-metric-alarm \
    --alarm-name myapp-db-low-memory \
    --metric-name FreeableMemory \
    --namespace AWS/RDS \
    --statistic Average \
    --period 300 \
    --evaluation-periods 2 \
    --threshold 524288000 \
    --comparison-operator LessThanThreshold \
    --dimensions Name=DBInstanceIdentifier,Value=myapp-db

Not Testing Restore Process Until Real Disaster

Why it happens: "The configuration looks good, it must work when I need it."

The real problem: Disaster arrives, you discover that:

  • You don't know how to use restore CLI correctly
  • Snapshot is corrupted
  • Process takes longer than expected
  • App doesn't connect to restored DB

Time lost in production: 2-3 hours troubleshooting in panic vs 20 min if you had practiced.

How to avoid it - Quarterly Disaster Recovery Drill:

# Every 3 months (Friday afternoon, low traffic):

# 1. Create snapshot
aws rds create-db-snapshot \
    --db-instance-identifier myapp-prod \
    --db-snapshot-identifier dr-drill-2025-q4

# 2. Measure time: Restore
time aws rds restore-db-instance-from-db-snapshot \
    --db-instance-identifier myapp-prod-dr-test \
    --db-snapshot-identifier dr-drill-2025-q4

# 3. Verify connectivity
psql -h myapp-prod-dr-test.xxx.rds.amazonaws.com \
     -U admin -d myapp -c "SELECT COUNT(*) FROM users;"

# 4. Document:
# - Total restore time: X minutes
# - Issues found
# - Update runbook

# 5. Clean up
aws rds delete-db-instance \
    --db-instance-identifier myapp-prod-dr-test \
    --skip-final-snapshot

aws rds delete-db-snapshot \
    --db-snapshot-identifier dr-drill-2025-q4

Storage Full Without Autoscaling Configured

Why it happens: "I have 20 GB, takes months to fill up, I'll check when it gets close."

The real problem: Friday 6 PM, storage reaches 100%, DB enters read-only mode, app stops working.

# DB state when storage is full:
Status: "storage-full"
# Queries: Only SELECT works
# INSERT/UPDATE/DELETE: ERROR

How to avoid it - Storage Autoscaling:

aws rds modify-db-instance \
    --db-instance-identifier myapp-db \
    --max-allocated-storage 100 \
    --apply-immediately

# RDS automatically increases storage when:
# - Free space under 10% of allocated storage
# - Or under 6 GB free
# - Low-storage lasts at least 5 minutes
# - At least 6 hours since last storage modification

Preventive alarm:

aws cloudwatch put-metric-alarm \
    --alarm-name myapp-db-low-storage \
    --metric-name FreeStorageSpace \
    --namespace AWS/RDS \
    --statistic Average \
    --period 300 \
    --evaluation-periods 1 \
    --threshold 2147483648 \
    --comparison-operator LessThanThreshold \
    --dimensions Name=DBInstanceIdentifier,Value=myapp-db

Cost Considerations

Cost Components in RDS

ComponentPricingFree Tier
DB Instance (compute)Per hour based on instance class750 hrs/month db.t3.micro (12 months)
Storage - gp3$0.115/GB-month20 GB (12 months)
Storage - io20.138/GB−month+0.138/GB-month + 0.065/IOPSNot included
Provisioned IOPS (gp3)$0.005/IOPS above 3,000Not included
Automated BackupsFree up to DB size, then $0.095/GB-monthIncluded
Manual Snapshots$0.095/GB-monthNot included
Data Transfer Out$0.09/GB (to internet)1 GB/month (12 months)
Multi-AZ (standby replica)2x cost of DB instance + storageNot included
Read ReplicaSeparate DB instance + storage costNot included

Real Application Cost Example

Configuration:

  • Instance: db.t3.small (2 vCPU, 2 GB RAM)
  • Storage: 100 GB gp3, 5,000 IOPS (2,000 extra)
  • Backups: 7 days retention (~150 GB backups total)
  • Manual snapshots: 2 snapshots of 100 GB each
  • Multi-AZ: Yes
  • Region: sa-east-1

Monthly calculation:

DB Instance (primary):
  db.t3.small x 730 hrs = $30/month

DB Instance (standby - Multi-AZ):
  db.t3.small x 730 hrs = $30/month

Storage (primary):
  100 GB gp3 x $0.115 = $11.50/month

Storage (standby):
  100 GB gp3 x $0.115 = $11.50/month

Provisioned IOPS extra:
  2,000 IOPS x $0.005 = $10/month (primary only)

Automated Backups:
  First 100 GB free (= DB size)
  Excess: 50 GB x $0.095 = $4.75/month

Manual Snapshots:
  200 GB x $0.095 = $19/month

TOTAL MONTHLY: $116.75/month

Optimization Strategies

1. Reserved Instances for stable production

db.t3.small on-demand: $30/month x 12 = $360/year

Reserved Instance (1 year, No Upfront):
  ~$216/year (40% savings)

Reserved Instance (3 years, All Upfront):
  ~$140/year (60% savings)

Recommendation: 1-year RI for prod DBs you know will run over 1 year

2. Rightsizing with CloudWatch Metrics

# If I see CPU under 30% and Memory under 50% for 2+ weeks:
# Downgrade from db.t3.small -> db.t3.micro

Savings: $30/month x 2 (primary + standby) = $60/month

3. Optimized backup retention

Automated backups:
  7 days (dev/test)
  14 days (prod - cost/recovery balance)
  30 days (compliance only if required)

Manual snapshots:
  Only for compliance/long-term
  Delete after verifying no longer needed

4. Dev/Test instances stopped off-hours

# Lambda scheduled (EventBridge):
# Monday-Friday 8 AM: Start DB
# Monday-Friday 6 PM: Stop DB

Savings: ~60% of instance cost
db.t3.small: $30/month -> $12/month (12 hrs/day vs 24 hrs)

Integration with Other Services

AWS ServiceHow It IntegratesTypical Use Case
EC2App servers on EC2 connect to RDS via endpointBackend apps (Laravel, Django, Rails) using RDS as datastore
LambdaLambda functions connect to RDS (via VPC or RDS Proxy)Serverless APIs, scheduled jobs reading/writing to DB
VPCRDS instances live in VPC subnets, security groups control accessNetwork isolation, private subnets for DBs, granular traffic control
Secrets ManagerStores DB credentials, automatic rotationDon't hardcode passwords, automatic credential rotation every 30-90 days
CloudWatchAutomatic metrics (CPU, connections, IOPS), logs, alarmsPerformance monitoring, alerts when thresholds exceeded
CloudTrailAudit trail of RDS operations (create, modify, delete)Compliance, forensics, "who deleted the DB on Friday?"
S3Snapshot export, long-term backups, query results (Athena)Cross-region disaster recovery, compliance archives, analytics on exported data
IAMControl who can manage RDS, IAM database authenticationPrinciple of least privilege, developers only read access to prod DBs
KMSEncryption keys for storage at rest, snapshot encryptionCompliance (HIPAA, PCI-DSS), encrypted databases
EventBridgeRDS events (backup complete, failover, low storage)Automation (notify Slack on failover, trigger Lambda on backup failure)
SNSTarget of CloudWatch alarms for notificationsEmail/SMS when CPU over 80%, storage under 10%, connections maxed
Route53Health checks of RDS endpoint, failover DNSMulti-region DR, automatic DNS failover if primary region fails
DMSMigrate data to RDS from on-premise or other DBsLift-and-shift migrations, ongoing replication, zero-downtime migrations
ECS/FargateContainerized apps connect to RDSMicroservices architecture, each service with its own DB connection pool
ElastiCacheCache layer in front of RDS to reduce DB loadSession storage, query result caching, hot data acceleration
RDS ProxyManaged connection pooler between app and RDSLambda functions (avoid connection exhaustion), faster failover

Additional Resources

Official AWS Documentation

Whitepapers and Best Practices

Tutorials and Workshops

Engine-Specific Resources

For AWS Solutions Architect Associate Certification