
EE 547 - Unit 9
Fall 2025
Instance store: Physical disk attached to host hardware
# Check available block devices on EC2 instance
$ lsblk
NAME SIZE TYPE MOUNTPOINT
nvme0n1 8G disk / # Root (EBS)
nvme1n1 475G disk # Instance store (NVMe)Instance store provides highest performance - directly attached NVMe SSDs with ~100k IOPS and sub-millisecond latency.
Data loss occurs on:
The storage is physically part of the host machine. When the instance moves to different hardware (or ceases to exist), the disk does not follow.
Cannot detach or snapshot. Unlike network-attached storage, there is no mechanism to preserve instance store contents independently of the instance.
Appropriate for: Scratch space, temporary processing files, caches that can be rebuilt. Not appropriate for any data that must survive instance lifecycle.

Single instance architecture
All requests handled by same instance, all file operations see same filesystem state.
Multiple instance architecture
Load balancer distributes requests. Each instance has independent local filesystem.
Instance A Instance B
├── /data/models/ ├── /data/models/
│ └── classifier.pkl │ └── (empty)
└── /data/uploads/ └── /data/uploads/
└── file1.pdf └── (empty)Request uploads file to Instance A. Subsequent request routed to Instance B. File does not exist - FileNotFoundError.
Model deployed to Instance A. Instance B cannot serve predictions - model not present.
Fundamental issue: Local filesystem is instance-scoped, but application logic assumes shared state across all request handlers.

Three abstractions provide instance-independent storage with different access models:

Block storage presents a virtual disk device. Instance mounts it as filesystem, uses standard file operations. One instance attachment at a time (with limited exceptions).
File storage provides shared filesystem via network protocol (NFS). Multiple instances mount simultaneously, see same files, coordinate via filesystem semantics.
Object storage exposes HTTP API for storing and retrieving blobs by key. No filesystem abstraction - operations are PUT, GET, DELETE over HTTPS. Accessible from any network location.
Each model trades capabilities for constraints. Selection depends on how application accesses data.
EBS volume appears as block device to instance
# List block devices - EBS volume appears as nvme device
$ lsblk
NAME SIZE TYPE
nvme0n1 8G disk # Root volume
└─nvme0n1p1 8G part /
nvme1n1 100G disk # Attached EBS volumeThe volume is a raw block device - bytes addressable by sector. No filesystem until you create one.
Preparing new volume for use:
# Create partition table (GPT for volumes > 2TB)
$ sudo parted /dev/nvme1n1 mklabel gpt
# Create single partition spanning volume
$ sudo parted /dev/nvme1n1 mkpart primary 0% 100%
# Create ext4 filesystem on partition
$ sudo mkfs.ext4 /dev/nvme1n1p1
# Create mount point and mount
$ sudo mkdir /mnt/data
$ sudo mount /dev/nvme1n1p1 /mnt/data
# Verify
$ df -h /mnt/data
Filesystem Size Used Avail Use% Mounted on
/dev/nvme1n1p1 98G 61M 93G 1% /mnt/dataAfter mounting, /mnt/data behaves as normal directory. Application reads and writes files without awareness of underlying EBS volume.

EBS characteristics:
Mount persists only until reboot
The mount command attaches filesystem for current session. After instance reboot, volume is still attached but not mounted.
# After reboot
$ df -h /mnt/data
df: /mnt/data: No such file or directory
# Volume still attached, just not mounted
$ lsblk
NAME SIZE TYPE
nvme1n1 100G disk # Present but not mountedConfigure automatic mount via fstab:
# Get volume UUID (stable identifier)
$ sudo blkid /dev/nvme1n1p1
/dev/nvme1n1p1: UUID="a1b2c3d4-..." TYPE="ext4"
# Add entry to /etc/fstab
$ echo 'UUID=a1b2c3d4-... /mnt/data ext4 defaults,nofail 0 2' | \
sudo tee -a /etc/fstab
# Test fstab entry (mount all entries)
$ sudo mount -a
# Verify
$ df -h /mnt/dataThe nofail option allows instance to boot even if volume attachment fails - prevents boot hang if volume is detached.
Complete EBS setup sequence
# 1. Identify the new volume
lsblk
# 2. Create partition table
sudo parted /dev/nvme1n1 mklabel gpt
sudo parted /dev/nvme1n1 mkpart primary 0% 100%
# 3. Create filesystem
sudo mkfs.ext4 /dev/nvme1n1p1
# 4. Create mount point
sudo mkdir -p /mnt/data
# 5. Get UUID for fstab
VOLUME_UUID=$(sudo blkid -s UUID -o value /dev/nvme1n1p1)
# 6. Add to fstab for persistent mount
echo "UUID=$VOLUME_UUID /mnt/data ext4 defaults,nofail 0 2" | \
sudo tee -a /etc/fstab
# 7. Mount (using fstab entry)
sudo mount -a
# 8. Verify
df -h /mnt/data
UUID vs device path:
Device names (/dev/nvme1n1) can change between boots depending on attachment order. UUID is stable identifier assigned when filesystem is created.
Standard EBS volumes attach to one instance at a time
Attempt: Attach vol-0abc123 to Instance B
(currently attached to Instance A)
Error: Volume vol-0abc123 is in 'in-use' state
attached to instance i-0111222333
Filesystem design assumes exclusive access
Filesystems like ext4 maintain metadata structures:
These structures are cached in memory and written back to disk. Two instances mounting same volume:
EBS Multi-Attach (io1/io2 volumes only)
Multi-Attach allows attaching io1 or io2 volume to up to 16 instances. These are provisioned IOPS SSD volumes designed for high-performance workloads.
Requires cluster-aware filesystem (not ext4):
Not general-purpose shared storage - specialized use case requiring explicit coordination.

Standard filesystems cannot coordinate across multiple hosts - they assume single writer.
Network filesystem architecture
EFS (Elastic File System): Managed NFS service. Multiple instances connect to shared network endpoint (not attached like block storage).
# Mount EFS filesystem (NFS protocol)
$ sudo mount -t nfs4 \
-o nfsvers=4.1,rsize=1048576,wsize=1048576 \
fs-0123456789.efs.us-east-1.amazonaws.com:/ \
/mnt/sharedMount target fs-0123456789.efs.us-east-1.amazonaws.com: DNS name resolving to EFS infrastructure within VPC.
File operations traverse network
# Write on Instance A
with open('/mnt/shared/config.json', 'w') as f:
json.dump(config, f)
# Read on Instance B (same filesystem)
with open('/mnt/shared/config.json', 'r') as f:
config = json.load(f) # Sees Instance A's writeEvery read and write is a network operation to EFS service. The filesystem appears local but data travels over network.
NFS protocol handles coordination:

All instances connect to same EFS endpoint. Filesystem state is centralized in EFS service.
Latency comparison with local storage
| Storage Type | Read Latency | Write Latency |
|---|---|---|
| Instance store (NVMe) | ~0.1 ms | ~0.1 ms |
| EBS gp3 | ~1-2 ms | ~1-2 ms |
| EFS General Purpose | ~5-10 ms | ~5-10 ms |
| EFS Max I/O | ~10-25 ms | ~10-25 ms |
Every EFS operation crosses network, adds latency overhead compared to locally-attached storage.
Throughput scales with data size
EFS throughput depends on data stored:
Data stored: 100 GB
Baseline: 5 MB/s (bursting mode)
Burst: Up to 100 MB/s (while credits available)
Data stored: 1 TB
Baseline: 50 MB/s
Small file operations
NFS protocol overhead per operation. Many small files = many round trips.

EFS appropriate for: Shared configuration, content management, home directories.
Not appropriate for: Databases, high-throughput processing, latency-sensitive applications.
S3 operations are HTTP requests
import boto3
s3 = boto3.client('s3')
# PUT object (HTTP PUT request)
s3.put_object(
Bucket='my-bucket',
Key='models/classifier.pkl',
Body=open('model.pkl', 'rb')
)
# GET object (HTTP GET request)
response = s3.get_object(
Bucket='my-bucket',
Key='models/classifier.pkl'
)
data = response['Body'].read()
# DELETE object (HTTP DELETE request)
s3.delete_object(
Bucket='my-bucket',
Key='models/classifier.pkl'
)No filesystem - HTTP semantics:
/ but it’s just a character)Accessible from anywhere with network:
EC2 instance, Lambda function, laptop, any HTTP client. No attachment, no VPC requirement (with public endpoint).

Console displays folder hierarchy - API stores flat keys
AWS Console shows:
my-bucket/
├── models/
│ ├── v1/
│ │ └── classifier.pkl
│ └── v2/
│ └── classifier.pkl
└── uploads/
└── user123/
└── document.pdfActual S3 contents (three objects, flat):
No “models” folder exists. The / characters are part of the key string, not directory separators.
Listing with prefix filter:
# "List directory contents" = list objects with prefix
response = s3.list_objects_v2(
Bucket='my-bucket',
Prefix='models/'
)
for obj in response['Contents']:
print(obj['Key'])
# models/v1/classifier.pkl
# models/v2/classifier.pklDelimiter simulates directory listing:

Key prefix determines what queries are efficient
# Key structure: {type}/{user_id}/{timestamp}_{filename}
# Example: uploads/user123/2025-01-15_document.pdf
# Efficient: All uploads for specific user
s3.list_objects_v2(
Bucket='app-data',
Prefix='uploads/user123/'
)
# S3 filters at storage layer, returns only matching objects
# Inefficient: All uploads from specific date
# No prefix helps - must scan all objects, filter client-side
response = s3.list_objects_v2(Bucket='app-data', Prefix='uploads/')
for obj in response['Contents']:
if '2025-01-15' in obj['Key']:
# Found oneAlternative key structure for date queries:
# Key structure: {type}/{date}/{user_id}_{filename}
# Example: uploads/2025-01-15/user123_document.pdf
# Now efficient: All uploads from specific date
s3.list_objects_v2(
Bucket='app-data',
Prefix='uploads/2025-01-15/'
)
# Now inefficient: All uploads for specific user
# Must scan all datesChoose key structure based on primary access pattern. Cannot efficiently query by both user and date with single key structure.
Secondary access patterns may require maintaining separate key hierarchies (duplication) or external index (database).

Write then read returns current data
# Process A: Write object
s3.put_object(
Bucket='data',
Key='results/job-123.json',
Body=json.dumps({'status': 'complete', 'score': 0.94})
)
# Process B: Read immediately after
response = s3.get_object(
Bucket='data',
Key='results/job-123.json'
)
result = json.loads(response['Body'].read())
# Guaranteed to see Process A's writeConsistency guarantees:
| Operation Sequence | Guarantee |
|---|---|
| PUT → GET (same key) | Strong: sees new data |
| DELETE → GET | Strong: returns 404 |
| PUT → LIST | Eventual: may not appear immediately |
| Overwrite → GET | Strong: sees new version |
LIST operations may lag briefly
# Upload new object
s3.put_object(Bucket='data', Key='new-file.json', Body='{}')
# Immediate LIST might not include it
response = s3.list_objects_v2(Bucket='data', Prefix='')
# new-file.json may not appear in Contents yet
# But direct GET works immediately
s3.get_object(Bucket='data', Key='new-file.json') # SucceedsFor coordination between processes, use direct GET with known key rather than LIST to discover objects.

Single PUT has limits and reliability issues
Multipart upload splits file into parts
# boto3 handles multipart automatically for large files
s3.upload_file(
Filename='large_model.tar.gz', # 8 GB file
Bucket='ml-artifacts',
Key='models/large_model.tar.gz'
)
# Automatically uses multipart for files > 8 MB (configurable)Multipart mechanics:
# Explicit multipart control
from boto3.s3.transfer import TransferConfig
config = TransferConfig(
multipart_threshold=100 * 1024 * 1024, # Use multipart > 100 MB
multipart_chunksize=50 * 1024 * 1024, # 50 MB parts
max_concurrency=10 # 10 parallel uploads
)
s3.upload_file('huge_file.bin', 'bucket', 'key', Config=config)Failed part retries only that part, not entire file. Parallel uploads improve throughput on high-bandwidth connections.

Without presigned URLs: All data flows through your server
# Client uploads to your API
@app.route('/upload', methods=['POST'])
def upload():
file = request.files['document']
# File bytes received by your server
# Then uploaded to S3
s3.upload_fileobj(file, 'bucket', f'uploads/{file.filename}')
return {'status': 'uploaded'}500 MB file: Consumes your server’s bandwidth, memory, CPU, and time.
With presigned URLs: Client uploads directly to S3
# Your API generates presigned URL (< 1 ms)
@app.route('/get-upload-url', methods=['POST'])
def get_upload_url():
key = f"uploads/{uuid.uuid4()}/{request.json['filename']}"
url = s3.generate_presigned_url(
'put_object',
Params={'Bucket': 'uploads-bucket', 'Key': key},
ExpiresIn=3600 # URL valid for 1 hour
)
return {'upload_url': url, 'key': key}
# Client uploads directly to S3 using presigned URL
# Your server never handles the file bytesPresigned URL contains embedded authorization:
https://uploads-bucket.s3.amazonaws.com/uploads/abc123/doc.pdf
?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIA.../us-east-1/s3/aws4_request
&X-Amz-Date=20250115T100000Z
&X-Amz-Expires=3600
&X-Amz-Signature=a1b2c3d4...
Signature validates: Bucket, key, expiration time. Cannot be modified - S3 verifies signature.

Credentials embedded in code or environment: Security risk
# Dangerous: Credentials in code
s3 = boto3.client('s3',
aws_access_key_id='AKIAIOSFODNN7EXAMPLE',
aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
)
# Credentials visible in: git history, logs, error traces, process listingIAM role attached to compute resource
# EC2 instance or Lambda with IAM role
import boto3
s3 = boto3.client('s3') # No credentials specified
s3.upload_file('model.pkl', 'bucket', 'key') # Worksboto3 automatically retrieves credentials from instance metadata service. Credentials rotate automatically (typically hourly). No secrets in code, environment variables, or configuration files.
IAM policy controls access:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::ml-artifacts/models/*"
}]
}This policy: Allows read/write to models/ prefix in ml-artifacts bucket. Denies access to other prefixes, other buckets, delete operations.
Attach policy to IAM role, attach role to EC2 instance or Lambda function.

No secrets to manage, rotate, or leak. AWS handles credential lifecycle.
S3 can emit events when objects are created, deleted, or restored

Event configuration on bucket:
s3:ObjectCreated:*, s3:ObjectRemoved:*, s3:ObjectRestore:*uploads/.jpgEvent payload includes:
This enables event-driven architectures: Object uploaded → trigger processing automatically. Integration with Lambda, SQS, and SNS covered in subsequent sections.
Serve static files directly from S3 bucket
Configure bucket for website hosting:
Bucket contents:
Access via website endpoint:
http://my-site-bucket.s3-website-us-east-1.amazonaws.com/
Browser requests index.html → S3 returns file contents.
Requires public access policy:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-site-bucket/*"
}]
}Anyone can read objects. Appropriate for public static content.

Direct S3 hosting: Development, internal tools. Production typically adds CloudFront for HTTPS, caching, custom domains - covered in integration section.

Block when application requires filesystem: Database storage, applications using file paths and standard I/O.
File when multiple instances need shared filesystem: Shared configuration, content management, legacy applications.
Object when HTTP API is acceptable: ML models, user uploads, static assets, backups, any blob storage.
Selection follows from access requirements, not preference.
EC2 deployment requires operational decisions:
Your application runs on infrastructure you provision and maintain. The application is always resident - process starts at boot, listens for requests, handles them as they arrive.
Cost model: Pay for instance hours
Instance runs continuously whether handling 1000 requests/second or 0 requests.
# Check if your Flask app is running
$ ps aux | grep gunicorn
user 12345 0.1 1.2 gunicorn: master [app:app]
user 12346 0.0 0.8 gunicorn: worker [app:app]
user 12347 0.0 0.8 gunicorn: worker [app:app]
# It's running. Waiting for requests. Costing money.The t3.medium running your API costs ~$30/month whether it handles traffic or sits idle.

What if AWS managed more of the stack?
Instead of provisioning an EC2 instance where your code runs continuously, you provide only the code. AWS handles:
The trade-off:
You give up control over the execution environment in exchange for not managing it. No SSH access, no persistent processes, no direct filesystem.
Lambda function: Code as the deployment unit
# handler.py - This IS your entire deployment
def handler(event, context):
"""AWS invokes this function when triggered"""
name = event.get('name', 'World')
return {
'statusCode': 200,
'body': f'Hello, {name}!'
}No Flask, no gunicorn, no server configuration. You deploy this function. AWS runs it when something triggers it.
“Serverless” doesn’t mean no servers - it means servers aren’t your concern.

AWS manages runtime, patching, scaling, and infrastructure. You manage code.
EC2: Long-running process
# Flask app - process starts once, handles many requests
app = Flask(__name__)
@app.route('/greet')
def greet():
return f'Hello, {request.args.get("name", "World")}!'
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
# Process runs until terminated
# Maintains state between requests
# Keeps connections openThe process initializes once. Each request uses the already-running process.
Lambda: Function invoked per trigger
# Lambda handler - invoked fresh for each trigger
def handler(event, context):
# No persistent process
# No listening socket
# Function runs, returns, done
return {
'statusCode': 200,
'body': f'Hello, {event.get("name", "World")}!'
}Your function is invoked when something triggers it. There’s no “main” loop, no server listening. Lambda is not running your code right now - it will run your code when triggered.
The event parameter contains trigger data:

Lambda functions don’t run directly on bare hardware. AWS creates an execution environment - an isolated container-like sandbox for your code.
Execution environment provides:
/tmp, 512 MB - 10 GB)Execution environment isolation:
# Your handler runs inside the execution environment
def handler(event, context):
# context provides environment information
print(f"Request ID: {context.aws_request_id}")
print(f"Memory limit: {context.memory_limit_in_mb} MB")
print(f"Time remaining: {context.get_remaining_time_in_millis()} ms")
# Do work
return {'statusCode': 200, 'body': 'Done'}context object provides invocation and environment information.

The execution environment is the sandbox where your function runs. You configure its resources; AWS manages its lifecycle.
First invocation creates the environment
No suitable execution environment exists → Lambda must create one. This is a cold start.
Cold start phases:
# handler.py
import json
import boto3
import heavy_ml_library # Imported during cold start
# This runs ONCE during cold start, not per invocation
s3_client = boto3.client('s3')
model = heavy_ml_library.load_model('model.pkl')
def handler(event, context):
# This runs on EVERY invocation
result = model.predict(event['data'])
return {'statusCode': 200, 'body': json.dumps(result)}Imports and module-level code execute during initialization. handler function executes on each invocation.
Cold start duration depends on:

AWS keeps execution environments alive
After function completes, environment isn’t immediately destroyed. Kept available for subsequent invocations → can be reused.
Warm invocation skips initialization:
Cold: [Create Env][Download][Init Runtime][Init Code][Handler]
Warm: [Handler]
Warm invocations go directly to handler - no environment creation, downloads, or initialization.
Environment reuse implications:
# Module-level state persists between invocations
request_count = 0
db_connection = None
def handler(event, context):
global request_count, db_connection
request_count += 1 # Accumulates across warm invocations!
print(f"This environment has handled {request_count} requests")
# Connection reuse - don't recreate on every call
if db_connection is None:
db_connection = create_connection() # Only on cold start
return {'statusCode': 200}Not guaranteed - AWS may terminate environment at any time. In practice, warm environments handle many invocations before termination.
Environment reuse is why:

Move expensive operations outside handler
# GOOD: Initialize once, reuse across invocations
import boto3
import pickle
# These run once per cold start
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('users')
# Load model once
response = s3.get_object(Bucket='models', Key='model.pkl')
model = pickle.loads(response['Body'].read())
def handler(event, context):
# Fast: uses pre-initialized resources
prediction = model.predict(event['features'])
table.put_item(Item={'id': event['id'], 'result': prediction})
return {'statusCode': 200}Lazy initialization for conditional use
# Resources only initialized if needed
_heavy_client = None
def get_heavy_client():
global _heavy_client
if _heavy_client is None:
_heavy_client = create_expensive_client()
return _heavy_client
def handler(event, context):
if event.get('needs_heavy_processing'):
client = get_heavy_client() # Only init if needed
# ...Understand what initializes when
import json
import boto3 # Import runs during init
# Module level - runs once per cold start
print("Cold start - initializing")
config = load_config()
client = boto3.client('s3')
def helper_function(x):
# Defined at module level
# Body runs when called
return x * 2
def handler(event, context):
# This runs on every invocation
print("Handler executing")
# Function call happens during invocation
result = helper_function(event['value'])
# But client was already created
client.put_object(...) # Uses existing clientCold start output:
Cold start - initializing
Handler executing
Subsequent warm invocations:
Handler executing
The initialization message only appears on cold starts.
Writable filesystem in the execution environment
/tmp directory: 512 MB to 10 GB (configurable). Only writable location - code package is read-only.
import os
import tempfile
def handler(event, context):
# Can write to /tmp
with open('/tmp/cache.json', 'w') as f:
json.dump(event, f)
# Can read it back
with open('/tmp/cache.json', 'r') as f:
data = json.load(f)
# Check space
statvfs = os.statvfs('/tmp')
available_mb = (statvfs.f_frsize * statvfs.f_bavail) / (1024 * 1024)
print(f"Available /tmp space: {available_mb:.1f} MB")
return {'statusCode': 200}Persistence characteristics:

Use for: cached downloads, intermediate processing, temporary data. Don’t rely on for persistence - environment termination not in your control.
Memory and CPU are proportionally coupled
Configure memory (128 MB to 10 GB). Lambda allocates CPU proportionally - no direct CPU selection.
| Memory | vCPU Equivalent |
|---|---|
| 128 MB | ~0.08 vCPU |
| 512 MB | ~0.33 vCPU |
| 1769 MB | 1 vCPU |
| 3538 MB | 2 vCPU |
| 10240 MB | 6 vCPU |
Implication: CPU-bound work (ML inference, image processing) needs more memory to get more CPU - even if the work doesn’t need the memory.
# If this is slow at 512 MB
def handler(event, context):
# CPU-intensive work
result = expensive_computation(event['data'])
return result
# Increasing to 2048 MB makes it ~4x faster
# Not because we need memory, but because we get more CPUMemory also affects cold start speed:
Higher memory = more CPU = faster initialization. Heavy imports may have faster cold starts at higher memory - potentially reducing total cost despite higher per-ms price.

Functions have a maximum execution time
Configurable: 1 second to 15 minutes. Function doesn’t complete in time → Lambda terminates it.
def handler(event, context):
# Check remaining time
remaining_ms = context.get_remaining_time_in_millis()
print(f"Time remaining: {remaining_ms} ms")
# Long-running work
for item in event['items']:
if context.get_remaining_time_in_millis() < 5000:
# Less than 5 seconds left - stop gracefully
return {
'statusCode': 200,
'body': 'Partial completion - timeout approaching'
}
process_item(item)
return {'statusCode': 200, 'body': 'Complete'}Timeout termination:
Setting appropriate timeouts:
Typical values: API handlers 10-30s, batch processing up to 15 min. Need longer? Use Step Functions or break into smaller pieces.

Configuration without code changes
Environment variables: different configurations per deployment stage, no code changes.
import os
# Read configuration from environment
TABLE_NAME = os.environ['DYNAMODB_TABLE']
LOG_LEVEL = os.environ.get('LOG_LEVEL', 'INFO')
API_ENDPOINT = os.environ['EXTERNAL_API_URL']
def handler(event, context):
# Use configuration
dynamodb.Table(TABLE_NAME).put_item(...)Set via console, CLI, or infrastructure code:
aws lambda update-function-configuration \
--function-name my-function \
--environment "Variables={DYNAMODB_TABLE=prod-users,LOG_LEVEL=WARNING}"Environment per stage:
| Variable | Development | Production |
|---|---|---|
| DYNAMODB_TABLE | dev-users | prod-users |
| LOG_LEVEL | DEBUG | WARNING |
| API_ENDPOINT | https://sandbox.api.com | https://api.com |
Same code, different configuration per deployment.
Secrets handling
Environment variables visible in console and logs. For sensitive values → AWS Secrets Manager or Parameter Store.
import boto3
import os
# NOT this - secret visible in Lambda console
API_KEY = os.environ['API_KEY'] # Visible!
# Better - retrieve at runtime
secrets = boto3.client('secretsmanager')
def get_api_key():
response = secrets.get_secret_value(
SecretId='my-api-key'
)
return response['SecretString']
# Can cache in module scope for reuse
_api_key = None
def handler(event, context):
global _api_key
if _api_key is None:
_api_key = get_api_key()
# Use _api_key securelyEnvironment variables: Non-sensitive configuration (table names, endpoints, feature flags).
Secrets Manager: Credentials, API keys, connection strings.
Deployment package structure
Lambda code lives in a deployment package - handler plus dependencies:
Multiple functions with same dependencies → each includes them separately.
Layers separate shared dependencies
Layer: ZIP archive with libraries, custom runtimes, or other dependencies. Functions reference layers instead of bundling everything.
# Function just imports - layer provides the packages
import numpy as np
import pandas as pd
from sklearn import ensemble
def handler(event, context):
# Libraries from layer are available
df = pd.DataFrame(event['data'])
...Layer benefits:

Functions can use up to 5 layers. Total unzipped size (function + layers) limited to 250 MB.
Cold start must download and load your code. More dependencies means:
Larger packages = longer cold starts
# Minimal dependencies - fast cold start (~200ms)
import json
def handler(event, context):
return {'statusCode': 200, 'body': json.dumps(event)}# Heavy dependencies - slow cold start (~2000ms)
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import tensorflow as tf
def handler(event, context):
# ...Import granularity matters:
# Imports entire sklearn package
from sklearn import *
# Only loads ensemble module
from sklearn.ensemble import RandomForestClassifierLazy imports defer cost to when needed:

The relationship isn’t linear - some libraries are particularly slow to import (pandas, tensorflow) due to their own initialization logic.
When Lambda receives concurrent triggers, it creates additional execution environments. Each concurrent invocation runs in its own isolated environment.
Multiple triggers can arrive simultaneously
Time 0: Trigger 1 arrives → Environment A handles it
Time 10ms: Trigger 2 arrives → Environment B handles it (A still busy)
Time 20ms: Trigger 3 arrives → Environment C handles it
Time 50ms: Trigger 1 completes → Environment A becomes free
Time 60ms: Trigger 4 arrives → Environment A handles it (reused)
Concurrency scaling:
Account-level limit:
All Lambda functions in your AWS account share a concurrency pool. If you have 10 functions and 1000 limit, they collectively can’t exceed 1000 concurrent executions.

These features exist but add cost and complexity. Most Lambda workloads don’t need them - understand when they’re actually warranted.
Reserved concurrency: Guarantee + Limit
Reserve a portion of account concurrency for specific function. Other functions cannot consume this allocation.
Account limit: 1000
Function A reserved: 200
Function B reserved: 100
Unreserved pool: 700
Function A can use up to 200 (guaranteed)
Function A cannot exceed 200 (limited)
Other functions share remaining 700
Use reserved concurrency when:
Not needed when: Default scaling behavior is acceptable and downstream systems can handle the load.
Provisioned concurrency: Eliminate cold starts
Keep N execution environments initialized and ready. All invocations up to N are warm.
Provisioned: 10 environments
Invocations 1-10: Instant (pre-warmed)
Invocation 11+: May cold start (scales beyond provisioned)
The cost trade-off:
Use provisioned concurrency only when:
Not needed for: Background processing, async workloads, internal tools, or any function where occasional cold start latency is acceptable.
Two pricing components:
Example calculation:
Function configuration:
- Memory: 512 MB (0.5 GB)
- Average execution time: 200 ms (0.2 seconds)
- Invocations per month: 1,000,000
Request charges:
1,000,000 requests × $0.20/million = $0.20
Duration charges:
GB-seconds = 0.5 GB × 0.2 sec × 1,000,000 = 100,000 GB-s
100,000 × $0.0000166667 = $1.67
Total monthly cost: $0.20 + $1.67 = $1.87
Free tier (per month):
Many low-traffic applications fit entirely in free tier.

Lambda functions execute in response to events. These events come from various sources, each delivering data in a specific format to your handler.

Each source has different characteristics:
| Source | Invocation Type | Use Case |
|---|---|---|
| API Gateway | Synchronous - waits for response | HTTP APIs, webhooks |
| S3 | Asynchronous - fire and forget | File processing, uploads |
| SQS | Poll-based - Lambda pulls messages | Work queues, decoupling |
| EventBridge | Asynchronous | Scheduled tasks, event routing |
The invocation type affects how errors are handled and how your function should respond.
API Gateway: HTTP endpoint in front of Lambda
AWS service that accepts HTTP requests, routes to backend services including Lambda. Without it, Lambda has no HTTP endpoint - function exists but no URL to call it.
Integration flow:
Client → API Gateway → Lambda → Response → Client
(HTTP) (invoke) (return) (HTTP)
API Gateway handles:
Lambda handles:

When API Gateway invokes your Lambda function, it sends an event object containing request details:
def handler(event, context):
# event contains the HTTP request details
# HTTP method and path
http_method = event['httpMethod'] # 'GET', 'POST', etc.
path = event['path'] # '/users/123'
# Query string parameters
params = event.get('queryStringParameters') or {}
page = params.get('page', '1')
# Request headers
headers = event.get('headers') or {}
auth_token = headers.get('Authorization')
content_type = headers.get('Content-Type')
# Request body (for POST/PUT)
body = event.get('body') # String - parse JSON if needed
if body and content_type == 'application/json':
import json
data = json.loads(body)
# Path parameters (from URL like /users/{id})
path_params = event.get('pathParameters') or {}
user_id = path_params.get('id')
# Return HTTP response
return {
'statusCode': 200,
'headers': {
'Content-Type': 'application/json'
},
'body': json.dumps({'user_id': user_id, 'page': page})
}The response must include statusCode and body. Headers are optional but commonly needed for Content-Type and CORS.
S3 can trigger Lambda on object changes
Configure bucket to emit events on object create/delete/modify. Lambda subscribes to these events.
Common trigger patterns:
s3:ObjectCreated:* - Any object creation (PUT, POST, Copy)s3:ObjectRemoved:* - Any object deletions3:ObjectCreated:Put - Specifically PUT operationsEvent configuration can filter by prefix/suffix:
Trigger: s3:ObjectCreated:*
Prefix: uploads/
Suffix: .jpg
Only triggers for: uploads/*.jpg
Does not trigger for: uploads/doc.pdf or images/photo.jpg
Invocation is asynchronous:
S3 emits event and continues - doesn’t wait for Lambda. If Lambda fails, S3 doesn’t know or retry. Lambda service handles retries (twice by default).
Use cases:

When S3 triggers your Lambda, the event contains information about what changed:
def handler(event, context):
# event['Records'] is a list - can batch multiple events
for record in event['Records']:
# Event type
event_name = record['eventName'] # 'ObjectCreated:Put'
# Bucket information
bucket = record['s3']['bucket']['name'] # 'my-uploads-bucket'
# Object information
key = record['s3']['object']['key'] # 'uploads/photo.jpg'
size = record['s3']['object']['size'] # 1234567 (bytes)
# Key is URL-encoded - decode it
from urllib.parse import unquote_plus
decoded_key = unquote_plus(key) # Handles spaces, special chars
# Now process the object
if event_name.startswith('ObjectCreated'):
process_new_object(bucket, decoded_key, size)
elif event_name.startswith('ObjectRemoved'):
cleanup_removed_object(bucket, decoded_key)
def process_new_object(bucket, key, size):
import boto3
s3 = boto3.client('s3')
# Download the object that triggered this event
response = s3.get_object(Bucket=bucket, Key=key)
content = response['Body'].read()
# Process content...The event tells you what changed. Your handler retrieves the actual content from S3 if needed.
SQS (Simple Queue Service) decouples producers and consumers
Messages placed in queue. Lambda polls and processes. Different from API Gateway (synchronous push) and S3 (asynchronous push).
Why queue-based processing?
Lambda polls SQS (you don’t):
Lambda service manages polling. Messages available → invokes your function with a batch.
def handler(event, context):
# event['Records'] contains batch of messages
for record in event['Records']:
body = record['body'] # Message content
# Process message
data = json.loads(body)
process_item(data)
# Successful return = messages deleted from queue
# Exception = messages return to queue for retryBatch processing:
Multiple messages per invocation (configurable 1-10,000). Efficient for high-volume queues - one cold start handles many messages.

Lambda is not universally better or worse than EC2 - it has characteristics that fit certain patterns well.
Lambda fits well:
Event-driven processing
Variable/unpredictable traffic
Short-duration tasks
Operations you don’t want to manage
Lambda fits poorly:
Long-running processes
Consistent high throughput
Specific runtime requirements
Stateful applications
Latency-critical without provisioning
The decision isn’t “Lambda vs EC2” but rather “which pattern fits this workload’s characteristics?”
Lambda functions rarely exist in isolation - they connect to other services to form complete systems.

Common integration patterns:
Each function small and focused. Complex workflows: multiple functions with services between them.
Direct call model
def handle_request(request):
order = parse_order(request)
result = process_order(order) # 30 seconds
return resultHTTP connection held entire 30 seconds.
Coupling consequences:
When synchronous makes sense:

Must wait for result
Synchronous appropriate here. Coupling inherent to use case.
Can proceed without waiting
Can decouple request from processing.
Decoupling in time = separating “request accepted” from “request completed”
Opens architectural options:
Instead of A → B directly: A → Storage → B
# Producer: accept and acknowledge
def handle_request(request):
job_id = str(uuid.uuid4())
store_job(job_id, request.data)
return {'job_id': job_id, 'status': 'accepted'}, 202
# Consumer: process from storage (separate process)
def process_pending_jobs():
while True:
job = get_next_job()
if job:
result = do_processing(job)
save_result(job.id, result)
mark_complete(job.id)HTTP 202 (Accepted) vs 200 (OK):
Client checks back later or receives callback.

Not just any database - specific requirements for reliable async processing:
Durability
Ordering (sometimes)
Delivery semantics
What if consumer crashes after receiving?
Visibility control
These requirements common enough → dedicated abstraction: message queue
Three fundamental operations
Producer Queue Consumer
| | |
|--- send(msg) ------->| |
| | |
| |<---- receive() -------|
| | |
| | [processing...] |
| | |
| |<---- delete() --------|Send: Add message, returns when durably stored
Receive: Get next message, becomes temporarily invisible
Delete: Confirm complete, permanently removed
Key insight: Message not removed on receive - removed on explicit delete. Enables recovery if consumer fails mid-processing.

What if consumer receives message then crashes before delete?
Mechanism
On receive → message invisible for configurable period
t=0 Consumer A receives
Message invisible (30s timeout starts)
t=15 Consumer A still processing
Message still invisible
t=35 Timeout expired, no delete
Message visible again
t=36 Consumer B receives
Retry beginsTimeout too short: Message reappears while still being processed → duplicate work
Timeout too long: Failed processing waits unnecessarily before retry

Don’t know how long processing takes? Extend timeout during processing:
def process_message(message, queue_client):
receipt_handle = message['ReceiptHandle']
for chunk in large_dataset:
process_chunk(chunk)
# Heartbeat: extend visibility every 20 seconds
queue_client.change_message_visibility(
QueueUrl=QUEUE_URL,
ReceiptHandle=receipt_handle,
VisibilityTimeout=30 # Another 30 seconds
)
# Done - delete
queue_client.delete_message(QueueUrl=QUEUE_URL, ReceiptHandle=receipt_handle)Pattern: Consumer “heartbeats” queue = “still working on this”
Most queues guarantee: message will be delivered, but might be delivered more than once.
How duplicates occur
1. Consumer receives message
2. Processes successfully (30 sec)
3. Sends delete request
4. Network hiccup - delete lost
5. Queue never receives delete
6. Visibility timeout expires
7. Message visible again
8. Another consumer receives
9. Processed twiceConsumer did everything right. Network unreliability caused duplicate.
Not a bug - fundamental trade-off
Exactly-once requires distributed transactions. Complex, expensive. At-least-once is practical choice.
Implication: Your code must handle duplicates.

Idempotent = processing same message twice produces same result as once
Naturally idempotent
# Set absolute value
user.status = 'active'
# Write to specific key
s3.put_object(Bucket='b', Key='k', Body=data)
# Upsert
db.upsert(key=order_id, data=order_data)“Set X to Y” - running twice just sets X to Y twice.
NOT idempotent
Making operations idempotent
Track what you’ve processed:
def process_payment(message):
message_id = message['MessageId']
# Already processed?
if db.get(f'processed:{message_id}'):
return # Skip duplicate
# Process
charge_customer(message['amount'])
# Record completion
db.put(f'processed:{message_id}', {
'processed_at': now()
})Use atomic check-and-set (DynamoDB conditional write) to handle race conditions.
When calling external services, pass idempotency key:
def process_order_payment(message):
order = json.loads(message['Body'])
# Deterministic key from message content
idempotency_key = f"order-{order['order_id']}-payment"
# Stripe deduplicates based on key
stripe.PaymentIntent.create(
amount=order['amount'],
currency='usd',
idempotency_key=idempotency_key # Same key = same response
)Many payment APIs support this precisely because at-least-once is common.
Best practices:
Some messages can never succeed. Invalid data, deleted resources, bugs.
Poison message problem
Message with malformed JSON arrives
Consumer 1: parse fails, crash
Message returns to queue
Consumer 2: parse fails, crash
Message returns to queue
Consumer 3: parse fails, crash
...forever...Queue keeps delivering, consumers keep failing.
Solution: Dead letter queue (DLQ)
After N failed attempts → move to separate queue

Max receive count
Each failed processing increments count. After N receives without delete → move to DLQ.
# Check in handler
def handler(event, context):
for record in event['Records']:
count = int(record['attributes']
['ApproximateReceiveCount'])
if count > 3:
log.error(f"Repeated failure: {record}")Typical settings:
DLQ depth signals:
Replay after fixing
def replay_dlq():
"""Move DLQ messages back to main"""
while True:
response = sqs.receive_message(
QueueUrl=DLQ_URL,
MaxNumberOfMessages=10,
WaitTimeSeconds=1
)
messages = response.get('Messages', [])
if not messages:
break
for msg in messages:
sqs.send_message(
QueueUrl=MAIN_QUEUE_URL,
MessageBody=msg['Body']
)
sqs.delete_message(
QueueUrl=DLQ_URL,
ReceiptHandle=msg['ReceiptHandle']
)Only replay after fixing issue, else messages cycle back to DLQ.
Queues absorb traffic spikes that exceed processing capacity.
Synchronous under spike
Requests arrive faster than processing capacity:
Asynchronous under spike
Time Arrivals Queue Processing
00:00 100/sec 0 100/sec
00:01 500/sec 400 100/sec ← spike
00:02 500/sec 800 100/sec
00:03 100/sec 700 100/sec ← spike ends
00:04 100/sec 600 100/sec
...
00:10 100/sec 0 100/sec ← drained
Synchronous under load
Asynchronous under load
Neither universally better

Hybrid: Accept and queue, but set max depth. Exceed threshold → reject. Buffering for normal spikes, bounded latency.
Synchronous: Scaling coupled
Same process receives and processes. Add capacity for processing → also add receiving capacity (maybe don’t need).
Queue-based: Scale each tier independently
Producers: 2 instances (receiving is fast)
↓
┌───────────┐
│ Queue │
└───────────┘
↓
Consumers: 8 instances (processing is slow)Lambda auto-scales on queue depth:

Order placed → need to:
Point-to-point queue
Each message → one consumer. Multiple services need event?
Couples producer to knowledge of all consumers.
Publish-subscribe
Producer publishes to topic. All subscribers get copy.

Topics alone: no durability. Subscriber unavailable → misses event.
Combined pattern
Each subscriber has own queue subscribed to topic:
Order Service
│
▼
┌────────────┐
│ SNS Topic │ (fan-out)
└────────────┘
│ │ │
▼ ▼ ▼
┌───┐┌───┐┌───┐
│SQS││SQS││SQS│ (durability)
└───┘└───┘└───┘
│ │ │
▼ ▼ ▼
Inv Email Ship (consumers)Benefits:
# Producer: publish to topic
sns.publish(
TopicArn=ORDER_TOPIC_ARN,
Message=json.dumps({
'order_id': order_id,
'event': 'placed'
})
)
# Consumer: read from own queue
def inventory_handler(event, context):
for record in event['Records']:
# SNS wraps in extra JSON layer
sns_msg = json.loads(record['body'])
order = json.loads(sns_msg['Message'])
update_inventory(order)
# Email service: separate Lambda, separate queue
def email_handler(event, context):
for record in event['Records']:
sns_msg = json.loads(record['body'])
order = json.loads(sns_msg['Message'])
send_confirmation(order)Standard Queue
FIFO Queue
.fifosqs.send_message(
QueueUrl=FIFO_QUEUE_URL,
MessageBody=json.dumps(data),
MessageGroupId='user-123',
MessageDeduplicationId=str(uuid.uuid4())
)Use FIFO when:
Default choice: Standard + idempotent processing
Topic characteristics
Subscribing
# Subscribe SQS queue
sns.subscribe(
TopicArn=topic_arn,
Protocol='sqs',
Endpoint=queue_arn
)
# Subscribe Lambda
sns.subscribe(
TopicArn=topic_arn,
Protocol='lambda',
Endpoint=function_arn
)
# Subscribe with filter
sns.subscribe(
TopicArn=topic_arn,
Protocol='sqs',
Endpoint=high_value_queue_arn,
Attributes={
'FilterPolicy': json.dumps({
'order_value': [{'numeric': ['>=', 1000]}]
})
}
)Filter: Only orders $1000+ to this queue.
Lambda polls SQS automatically via event source mapping.
No polling code needed
def handler(event, context):
for record in event['Records']:
body = json.loads(record['body'])
process_order(body)
# Success = messages deleted
# Exception = messages return to queueLambda service handles:
Batch size trade-off:

Default: Exception fails entire batch. All messages retry, including already-processed ones.
Report which messages failed
def handler(event, context):
failed = []
for record in event['Records']:
try:
process(json.loads(record['body']))
except Exception as e:
failed.append(record['messageId'])
log.error(f"Failed {record['messageId']}: {e}")
return {
'batchItemFailures': [
{'itemIdentifier': mid}
for mid in failed
]
}Lambda deletes successful, returns failed to queue.
Requires: Enable ReportBatchItemFailures in event source mapping.
Short polling (default)
Long polling
# Short poll
response = sqs.receive_message(
QueueUrl=QUEUE_URL,
MaxNumberOfMessages=10
)
# Long poll
response = sqs.receive_message(
QueueUrl=QUEUE_URL,
MaxNumberOfMessages=10,
WaitTimeSeconds=20
)Lambda uses long polling automatically.

Delay processing for retry backoff, scheduled tasks, debouncing.
Queue-level delay
All messages delayed N seconds:
Per-message delay
sqs.send_message(
QueueUrl=QUEUE_URL,
MessageBody=json.dumps(data),
DelaySeconds=120 # This message only
)Max delay: 15 minutes
Longer delays: EventBridge Scheduler, database + polling, Step Functions.
Use cases
Retry backoff:
def requeue_with_backoff(msg, attempt):
delay = min(30 * (2 ** attempt), 900)
sqs.send_message(
QueueUrl=QUEUE_URL,
MessageBody=msg['Body'],
DelaySeconds=delay
)Debouncing:
# On file change, delay processing
sqs.send_message(
QueueUrl=QUEUE_URL,
MessageBody=json.dumps({'path': path}),
DelaySeconds=30 # Wait for more changes
)Scheduled task:
SQS charges per API call, not per message.
Send batch (up to 10)
entries = [
{'Id': str(i), 'MessageBody': json.dumps(item)}
for i, item in enumerate(items[:10])
]
response = sqs.send_message_batch(
QueueUrl=QUEUE_URL,
Entries=entries
)
if response.get('Failed'):
for f in response['Failed']:
log.error(f"Failed: {f['Id']}")Delete batch (up to 10)
Cost impact
Throughput impact
Lambda batching
Lambda auto-batches receive/delete. Configure batch size 1-10,000.
Larger batches:
SQS limit: 256 KB per message
Store in S3, send reference
def send_large_message(data, bucket):
payload = json.dumps(data)
if len(payload.encode()) > 200_000:
key = f"messages/{uuid.uuid4()}.json"
s3.put_object(Bucket=bucket, Key=key, Body=payload)
sqs.send_message(
QueueUrl=QUEUE_URL,
MessageBody=json.dumps({
'__s3_ref__': True,
'bucket': bucket,
'key': key
})
)
else:
sqs.send_message(
QueueUrl=QUEUE_URL,
MessageBody=payload
)Consumer retrieves from S3
def handler(event, context):
for record in event['Records']:
body = json.loads(record['body'])
if body.get('__s3_ref__'):
response = s3.get_object(
Bucket=body['bucket'],
Key=body['key']
)
payload = json.loads(response['Body'].read())
# Cleanup
s3.delete_object(
Bucket=body['bucket'],
Key=body['key']
)
else:
payload = body
process(payload)AWS Extended Client Library automates this.
Key metrics
| Metric | Indicates |
|---|---|
ApproximateNumberOfMessagesVisible |
Backlog |
ApproximateNumberOfMessagesNotVisible |
In-flight |
ApproximateAgeOfOldestMessage |
How backed up |
NumberOfMessagesSent |
Producer throughput |
NumberOfMessagesDeleted |
Consumer throughput |
Healthy queue:
Problems:
Alarms
# Queue depth
cloudwatch.put_metric_alarm(
AlarmName='high-queue-depth',
MetricName='ApproximateNumberOfMessagesVisible',
Namespace='AWS/SQS',
Dimensions=[{'Name': 'QueueName', 'Value': NAME}],
Threshold=10000,
ComparisonOperator='GreaterThanThreshold',
EvaluationPeriods=3,
Period=60
)
# Message age
cloudwatch.put_metric_alarm(
AlarmName='old-messages',
MetricName='ApproximateAgeOfOldestMessage',
Namespace='AWS/SQS',
Dimensions=[{'Name': 'QueueName', 'Value': NAME}],
Threshold=3600,
ComparisonOperator='GreaterThanThreshold',
EvaluationPeriods=2,
Period=60
)Use synchronous
Examples:
Characteristics:
Use asynchronous
Examples:
Characteristics:
@app.route('/orders', methods=['POST'])
def create_order():
# Sync: validate immediately
order = validate_order(request.json)
if not order.valid:
return {'error': order.errors}, 400
# Sync: save to database
order_id = save_order(order)
# Async: queue fulfillment
sqs.send_message(
QueueUrl=FULFILLMENT_QUEUE,
MessageBody=json.dumps({
'order_id': order_id,
'action': 'fulfill'
})
)
# Return immediately
return {'order_id': order_id, 'status': 'processing'}, 202User gets immediate confirmation. Fulfillment happens reliably in background.
Lambda functions and HTTP requests live in different worlds
Lambda functions execute within AWS infrastructure. They can be invoked by AWS services (S3 events, SQS messages), but don’t listen on HTTP ports and don’t have public URLs.
How does a user’s HTTP request reach this function?
Direct Lambda invocation requires AWS credentials:
import boto3
lambda_client = boto3.client('lambda')
response = lambda_client.invoke(
FunctionName='my-function',
Payload=json.dumps({'name': 'test'})
)This works for service-to-service communication within AWS. But a browser making GET https://myapi.com/users cannot invoke Lambda directly - it doesn’t have AWS credentials, and Lambda isn’t listening on an HTTP port.
Connecting HTTP clients to Lambda requires something that accepts HTTP requests from the public internet and translates them into Lambda invocations.

API Gateway is a managed reverse proxy
It accepts HTTP requests at a public URL and routes them to backend services. For Lambda, it translates HTTP requests into Lambda invocations and Lambda responses back into HTTP responses.
https://abc123.execute-api.us-east-1.amazonaws.com/prod/users
└── API Gateway endpoint ──┘ └── path ──┘What happens on each request:
The Lambda function never opens a port, never manages connections, never deals with TLS. API Gateway handles the HTTP protocol; Lambda handles the business logic.
You deploy the function, API Gateway provides the URL.

When API Gateway invokes your Lambda function, it passes the HTTP request as a structured event:
# Incoming HTTP request:
# POST /users?role=admin HTTP/1.1
# Host: abc123.execute-api...
# Content-Type: application/json
# Authorization: Bearer xxx
#
# {"name": "Alice"}API Gateway transforms this into:
Your handler processes and returns:
def handler(event, context):
method = event['httpMethod']
body = json.loads(event['body'] or '{}')
if method == 'POST':
user = create_user(body)
return {
'statusCode': 201,
'headers': {
'Content-Type': 'application/json'
},
'body': json.dumps(user)
}
return {
'statusCode': 405,
'body': 'Method not allowed'
}API Gateway takes your return dict and constructs the HTTP response. Status code becomes HTTP status, headers become HTTP headers, body becomes response body.
A reverse proxy handles cross-cutting concerns
Things every API needs, but you don’t want to implement in every function:
Authentication - Verify identity before code runs
Rate limiting - Protect backend from overload
Request validation - Reject malformed requests early
Each of these protects your Lambda function. Invalid or excessive requests are rejected before they consume Lambda execution time (and cost).

Without rate limiting
Your Lambda function is invoked for every request. Malicious or misconfigured client sends 10,000 requests/second:
With rate limiting at the gateway
Excess requests receive HTTP 429 (Too Many Requests) immediately. They never reach Lambda, never hit your database, never cost you Lambda execution fees.
Client receives clear signal to back off. Gateway absorbed the attack; backend unaffected.

S3 bucket location affects user experience
Images, CSS, JavaScript bundles, user-uploaded documents. S3 bucket is in us-east-1 (Virginia).
User in Tokyo requests an image:
For a web page loading 50 assets, that’s 50 × 200ms of latency-bound requests. Even with parallel loading, the page feels slow.
Latency here is physics, not performance.
Light travels at ~200,000 km/s through fiber. Tokyo to Virginia is 11,000 km. That’s 55ms one way, minimum. No optimization can beat the speed of light.
Reducing this latency requires putting content closer to users.

Content Delivery Network (CDN) concept
Instead of one origin in one region, cache copies at edge locations around the world. When a user requests content:
CloudFront: AWS’s CDN
First request (cache miss): User in Tokyo → Tokyo edge → S3 origin → Tokyo edge → User
Total: ~200ms (fetch from origin)
Subsequent requests (cache hit): User in Tokyo → Tokyo edge → User
Total: ~20ms (served from edge)
The edge location is physically close. Most requests hit cache. Latency drops dramatically.

Cache hit: Edge has the content
User → Edge: GET /images/logo.png
Edge: "I have this cached"
Edge → User: 200 OK (from cache)
Latency: ~20msEdge returns cached copy immediately. Origin not contacted. This is the fast path - and for popular content, most requests are cache hits.
Cache miss: Edge must fetch from origin
User → Edge: GET /images/new-upload.png
Edge: "Not in my cache"
Edge → Origin: GET /images/new-upload.png
Origin → Edge: 200 OK + content
Edge: Cache it for next time
Edge → User: 200 OK
Latency: ~200ms (first request)
Latency: ~20ms (subsequent requests)First user pays the origin fetch latency. All users after benefit from the cached copy.
Cache effectiveness depends on:

TTL (Time To Live): How long to cache?
Short TTL (seconds to minutes):
Long TTL (hours to days):
Versioned filenames solve the TTL dilemma:
/static/app.js → TTL: 5 minutes (changes often?)
/static/app.v2.3.js → TTL: 1 year (version in name)With versioned filenames, you deploy new code with a new filename. Old cached versions don’t matter - new requests use the new filename. Set very long TTL, get both cacheability and instant updates.
Cache key: What makes a request “same”?
By default: URL path. Same path = same cached response.
Can include: Query strings, headers. But more in cache key = fewer cache hits.

Typical setup for web applications:
S3 bucket holds static files (build output, images, uploads). CloudFront distribution sits in front of the bucket. Users access files through CloudFront URL or custom domain.
Origin Access Control (OAC):
S3 bucket remains private - no public access. Only CloudFront can read from it. This prevents users from bypassing CDN and hitting S3 directly.
Benefits for your project:
When not worth the complexity:

Service A calls Service B over the network
Network calls introduce failure modes that don’t exist in local function calls:
In a local function call, you either get a result or an exception, quickly. Network calls add failure modes and latency variability.
In a service chain, failures compound:
Three nines at each step gives you less than three nines end-to-end. More services = more failure points.
Failures are not exceptional - they’re expected. Design for them.

Request failed? Try again.
def call_service():
for attempt in range(3):
try:
response = requests.get(url, timeout=5)
return response
except RequestException:
if attempt == 2:
raise
continueSome failures are transient - they go away if you retry:
Other failures are permanent - retry won’t help:
Retry transient failures. Don’t retry permanent ones.

Service B is overloaded, returning 503s
Client 1: Request fails → immediate retry Client 2: Request fails → immediate retry Client 3: Request fails → immediate retry … Client 1000: Request fails → immediate retry
All 1000 clients retry at the same moment. Service B, already struggling, now receives another 1000 requests instantly. It fails again. All 1000 retry again.
The retry storm keeps the service down.
# This makes things worse
for attempt in range(3):
try:
return requests.get(url)
except:
continue # Retry immediatelyEven if the service could recover in 1 second, the continuous retry storm prevents recovery. Clients are “helping” by retrying, but collectively they’re causing a denial of service.

Wait longer between each retry
import time
import random
def call_with_backoff(func, max_attempts=5):
for attempt in range(max_attempts):
try:
return func()
except TransientError:
if attempt == max_attempts - 1:
raise
# Wait: 1s, 2s, 4s, 8s...
delay = 2 ** attempt
time.sleep(delay)First retry after 1 second. Second retry after 2 more seconds. Third after 4 more. Exponential growth creates spacing.
But all clients still retry at the same intervals.
Client 1: Retry at t=1, t=3, t=7 Client 2: Retry at t=1, t=3, t=7 Client 3: Retry at t=1, t=3, t=7
Still clustered, just at different times. Need to break the synchronization.

Add randomness to break clustering
import random
def call_with_backoff_jitter(func, max_attempts=5):
for attempt in range(max_attempts):
try:
return func()
except TransientError:
if attempt == max_attempts - 1:
raise
# Full jitter: random between 0 and max
max_delay = 2 ** attempt
delay = random.uniform(0, max_delay)
time.sleep(delay)Full jitter: Random delay between 0 and 2^attempt
Client 1: Retry at t=0.7 Client 2: Retry at t=0.2 Client 3: Retry at t=0.9
Retries spread across the window instead of clustering at one point. Service receives steady trickle instead of burst.
AWS SDK uses this by default. Most well-designed clients implement exponential backoff with jitter. If you’re building retry logic, include jitter.

Every network call needs a timeout
Without timeout, a slow or unresponsive service blocks your code indefinitely. Connection stays open, thread stays blocked, resources stay consumed.
# Dangerous: No timeout
response = requests.get(url) # May never return
# Safe: Explicit timeout
response = requests.get(url, timeout=5) # Fail after 5s
# Better: Separate connect and read timeouts
response = requests.get(url, timeout=(3, 10))
# connect=3s, read=10sConnect timeout: How long to wait for connection establishment. Service down? Fail fast.
Read timeout: How long to wait for response data. Service slow? Don’t wait forever.
Timeout values depend on what you’re calling:

A calls B calls C
Each service has a timeout for its downstream call:
If A’s timeout < B’s timeout:
A gives up at 8s. B is still waiting for C (up to its 7s timeout). When C finally responds to B, B responds to… nothing. A already gave up. Work wasted.
Rule: Caller timeout > callee timeout
This ensures:

Retries have a limit
Even with backoff and jitter, you’re still calling a service that’s failing. If the service is down for 5 minutes, you’ll spend 5 minutes making failing calls (with exponential waits).
Meanwhile:
Circuit breaker: Fail fast when service is unhealthy
Track success/failure rate. If failure rate exceeds threshold, stop calling - return error immediately without making the request.

Timeline of circuit breaker behavior:
t=0 Requests succeeding (CLOSED)
t=10 Service starts failing
t=10-15 Failures accumulate, threshold hit
t=15 Circuit OPENS
t=15-45 Requests fail immediately (no network call)
Service has time to recover
t=45 Timeout expires, circuit HALF-OPEN
t=45 One test request sent
t=45 Success! Circuit CLOSES
t=45+ Normal operation resumesKey benefit: From t=15 to t=45, your code fails fast instead of waiting for timeouts. Service B isn’t receiving requests, giving it time to recover.
Configuration parameters:
class CircuitBreaker:
def __init__(self,
failure_threshold=5,
recovery_timeout=30):
self.state = 'CLOSED'
self.failures = 0
self.threshold = failure_threshold
self.timeout = recovery_timeout
self.opened_at = None
def call(self, func):
if self.state == 'OPEN':
if self._timeout_expired():
self.state = 'HALF_OPEN'
else:
raise CircuitOpenError()
try:
result = func()
self._on_success()
return result
except Exception:
self._on_failure()
raise
def _on_failure(self):
self.failures += 1
if self.failures >= self.threshold:
self.state = 'OPEN'
self.opened_at = time.time()
def _on_success(self):
self.failures = 0
self.state = 'CLOSED'These patterns work together:
@circuit_breaker(threshold=5, timeout=30)
def call_service_b():
for attempt in range(3):
try:
response = requests.get(
url,
timeout=(3, 10) # Connect, read
)
response.raise_for_status()
return response.json()
except Timeout:
delay = random.uniform(0, 2 ** attempt)
time.sleep(delay)
continue
except HTTPError as e:
if e.response.status_code >= 500:
delay = random.uniform(0, 2 ** attempt)
time.sleep(delay)
continue
raise # 4xx = don't retry
raise ServiceUnavailable()Order of defense:

Each layer catches different failures:
What it is
AWS service for coordinating multi-step workflows. Define states and transitions visually or in JSON. AWS executes the workflow, handling retries and state persistence.
When it helps:
When it’s overkill:
Simple async doesn’t need it:
The SQS pattern is simpler, cheaper, and sufficient for most cases. Step Functions adds value when the coordination logic itself is complex - not just “process this later.”

Use for: Order processing with approvals, multi-stage data pipelines, anything with complex branching.
Skip for: Simple “do this later” tasks.
What it is
Serverless event bus with content-based routing. Events from many sources, rules determine where they go.
How it differs from SNS:
// EventBridge rule: Route high-value orders
{
"source": ["orders"],
"detail-type": ["order.placed"],
"detail": {
"amount": [{"numeric": [">=", 1000]}]
}
}Only orders with amount >= 1000 trigger this rule. Other orders go elsewhere (or nowhere).
Built-in event sources:
AWS services emit events to EventBridge automatically. EC2 instance state changes, S3 events, CodePipeline status - all available without configuration.
When SNS is enough: All subscribers need all messages. No filtering. Simple fan-out. Use SNS + SQS.

Also useful for: Scheduled triggers (cron replacement). rate(5 minutes) or cron(0 12 * * ? *).
What it is
Ingest and process continuous data streams in real-time. Unlike SQS (message queue), Kinesis is a log - data is retained and can be replayed.
Kinesis behaves differently than SQS
SQS is a work queue: process a message, delete it, it’s gone. Consumers compete - each message goes to one consumer. Order isn’t guaranteed (Standard) or is strict but low throughput (FIFO).
Kinesis is a stream: data stays for 24 hours (configurable to 365 days). Consumers read independently at their own position - same data can be processed by analytics, archival, and alerting systems simultaneously. Order is guaranteed within a shard.
When Kinesis characteristics matter:
When SQS characteristics are sufficient:
Process once, delete, move on. Single consumer per message. No replay needed. Most async processing fits this model - use SQS by default.

Mental model: Kinesis = distributed log (retain, replay, multiple readers). SQS = work queue (process, delete, compete).

Default patterns for most projects:
EventBridge, Step Functions, Kinesis solve specific problems. Reach for them when you have those problems - not by default.
User uploads an image. Application must:
Two analysis paths run in parallel:
Neither path knows about the other. Both produce results that must be combined before the user sees an outcome.
This is a coordination problem.
The upload is synchronous (user waits for acknowledgment). The analysis is asynchronous (user doesn’t wait). The notification is eventually synchronous (user sees result).
How do you decompose this into services? Where do boundaries fall? What coordinates the parallel work?

Start with actions, then ask: who performs each?
| Action | Characteristics | Service Candidate |
|---|---|---|
| Receive upload | HTTP, needs response | API Gateway + Lambda |
| Store original | Durable, any size | S3 |
| Queue for processing | Decouple, buffer | SQS |
| Call Rekognition | AWS SDK, fast | Lambda |
| Run custom model | CPU/memory intensive | Lambda (or container) |
| Store results | Structured query | DynamoDB |
| Combine results | Wait for both | ??? |
| Notify user | Async delivery | SNS or direct |
The interesting question: “Combine results”
Two async processes complete at different times. Something must:
This is the coordination problem that shapes the architecture.

User uploads image, receives job ID, can check status later
# Upload Lambda handler
def handle_upload(event, context):
# Parse multipart upload from API Gateway
body = parse_multipart(event)
image_data = body['file']
user_id = event['requestContext']['authorizer']['user_id']
# Generate job ID for tracking
job_id = str(uuid.uuid4())
# Store original in S3
s3.put_object(
Bucket=UPLOAD_BUCKET,
Key=f'uploads/{job_id}/original.jpg',
Body=image_data,
Metadata={'user_id': user_id}
)
# Create job record (pending state)
dynamodb.put_item(
TableName=JOBS_TABLE,
Item={
'job_id': {'S': job_id},
'user_id': {'S': user_id},
'status': {'S': 'pending'},
'created_at': {'S': datetime.utcnow().isoformat()}
}
)
# Queue for processing (triggers async work)
sqs.send_message(
QueueUrl=PROCESSING_QUEUE,
MessageBody=json.dumps({'job_id': job_id})
)
# Return immediately - processing happens async
return {
'statusCode': 202,
'body': json.dumps({
'job_id': job_id,
'status': 'processing',
'status_url': f'/jobs/{job_id}'
})
}HTTP 202 Accepted: Request received, processing started, result not yet available. Client has job ID to poll status.

User waits ~100-200ms for acknowledgment.
Actual analysis hasn’t started yet - only queued.
SQS message triggers dispatcher Lambda
The processing queue doesn’t directly invoke both analysis paths. A dispatcher Lambda reads the message and initiates both branches.
# Dispatcher Lambda - triggered by SQS
def dispatch_handler(event, context):
for record in event['Records']:
message = json.loads(record['body'])
job_id = message['job_id']
# Get image location
image_key = f'uploads/{job_id}/original.jpg'
# Invoke Rekognition analysis (async)
lambda_client.invoke(
FunctionName='rekognition-analyzer',
InvocationType='Event', # Async - don't wait
Payload=json.dumps({
'job_id': job_id,
'image_key': image_key
})
)
# Invoke custom model analysis (async)
lambda_client.invoke(
FunctionName='custom-model-analyzer',
InvocationType='Event', # Async - don't wait
Payload=json.dumps({
'job_id': job_id,
'image_key': image_key
})
)
# Update job status
dynamodb.update_item(
TableName=JOBS_TABLE,
Key={'job_id': {'S': job_id}},
UpdateExpression='SET #s = :s',
ExpressionAttributeNames={'#s': 'status'},
ExpressionAttributeValues={':s': {'S': 'analyzing'}}
)InvocationType=‘Event’: Lambda invokes target asynchronously and returns immediately. Dispatcher doesn’t wait for either analysis to complete.
Two Lambdas now running in parallel, neither aware of the other.

Neither analysis Lambda blocks the other.
Dispatcher completes in milliseconds.
Rekognition is a managed service - you send image, receive labels
# Rekognition analyzer Lambda
def rekognition_handler(event, context):
job_id = event['job_id']
image_key = event['image_key']
# Call Rekognition (synchronous within this Lambda)
response = rekognition.detect_moderation_labels(
Image={
'S3Object': {
'Bucket': UPLOAD_BUCKET,
'Name': image_key
}
},
MinConfidence=70
)
# Extract results
labels = [
{
'name': label['Name'],
'confidence': label['Confidence'],
'parent': label.get('ParentName', '')
}
for label in response['ModerationLabels']
]
# Store results in DynamoDB
dynamodb.update_item(
TableName=JOBS_TABLE,
Key={'job_id': {'S': job_id}},
UpdateExpression='SET rekognition_result = :r, rekognition_at = :t',
ExpressionAttributeValues={
':r': {'S': json.dumps(labels)},
':t': {'S': datetime.utcnow().isoformat()}
}
)
# Check if other branch is complete
check_and_finalize(job_id)Rekognition returns structured data:
This Lambda writes its results and then checks if the job can be finalized.

Rekognition reads directly from S3 - image bytes don’t flow through Lambda.
Your own classification logic - runs longer, under your control
# Custom model analyzer Lambda
def custom_model_handler(event, context):
job_id = event['job_id']
image_key = event['image_key']
# Download image from S3
response = s3.get_object(Bucket=UPLOAD_BUCKET, Key=image_key)
image_bytes = response['Body'].read()
# Load model (cached in execution environment)
model = get_cached_model()
# Preprocess image
tensor = preprocess_image(image_bytes)
# Run inference
predictions = model.predict(tensor)
# Post-process results
categories = [
{
'category': CATEGORY_NAMES[i],
'score': float(predictions[i])
}
for i in range(len(predictions))
if predictions[i] > 0.5
]
# Store results
dynamodb.update_item(
TableName=JOBS_TABLE,
Key={'job_id': {'S': job_id}},
UpdateExpression='SET custom_result = :r, custom_at = :t',
ExpressionAttributeValues={
':r': {'S': json.dumps(categories)},
':t': {'S': datetime.utcnow().isoformat()}
}
)
# Check if other branch is complete
check_and_finalize(job_id)Time breakdown:
Cold start dominates. Warm invocations much faster.

All compute happens inside Lambda.
Memory setting affects inference speed (more memory = more CPU).
Both branches write results to DynamoDB. How do we know when both are complete?
Option 1: Polling
Status check Lambda queries DynamoDB periodically. When both results present, trigger finalization.
Option 2: Each branch checks and finalizes
After writing its result, each branch checks if the other result exists. First to see both results triggers finalization.
Option 3: DynamoDB Streams
DynamoDB stream triggers Lambda on every update. Lambda checks if both results present.
Option 4: Step Functions
State machine waits for both branches, then proceeds.
We’ll implement Option 2 - simple, no additional services.

For this example: Check-and-finalize.
Production systems often use Step Functions for complex workflows.
Each branch calls this after writing its result:
def check_and_finalize(job_id):
# Atomic read of current state
response = dynamodb.get_item(
TableName=JOBS_TABLE,
Key={'job_id': {'S': job_id}},
ConsistentRead=True # Strong consistency required
)
item = response.get('Item', {})
# Check if both results present
has_rekognition = 'rekognition_result' in item
has_custom = 'custom_result' in item
if not (has_rekognition and has_custom):
# Other branch not done yet - nothing to do
return
# Both done - try to claim finalization
# Conditional update prevents double-finalization
try:
dynamodb.update_item(
TableName=JOBS_TABLE,
Key={'job_id': {'S': job_id}},
UpdateExpression='SET #s = :s, finalized_at = :t',
ConditionExpression='#s <> :done',
ExpressionAttributeNames={'#s': 'status'},
ExpressionAttributeValues={
':s': {'S': 'finalizing'},
':done': {'S': 'complete'}
}
)
except dynamodb.exceptions.ConditionalCheckFailedException:
# Other branch already claimed finalization
return
# We claimed it - do the finalization
finalize_job(job_id, item)ConditionExpression ensures only one branch finalizes, even if both check simultaneously.

DynamoDB conditional update is atomic.
Only one branch wins the race.
The branch that claims finalization combines results and determines outcome:
def finalize_job(job_id, item):
# Parse both results
rekognition = json.loads(item['rekognition_result']['S'])
custom = json.loads(item['custom_result']['S'])
# Business logic: combine analysis
decision = make_moderation_decision(rekognition, custom)
# Update final status
dynamodb.update_item(
TableName=JOBS_TABLE,
Key={'job_id': {'S': job_id}},
UpdateExpression='''
SET #s = :status,
decision = :decision,
completed_at = :time
''',
ExpressionAttributeNames={'#s': 'status'},
ExpressionAttributeValues={
':status': {'S': 'complete'},
':decision': {'S': json.dumps(decision)},
':time': {'S': datetime.utcnow().isoformat()}
}
)
# Notify user (async)
sns.publish(
TopicArn=NOTIFICATION_TOPIC,
Message=json.dumps({
'job_id': job_id,
'user_id': item['user_id']['S'],
'decision': decision
})
)Decision logic is application-specific:

Business logic concentrated in one place.
Easy to modify decision rules without changing pipeline structure.

Seven services, one pipeline:
Each service boundary is a potential failure point
| Failure | Impact | Recovery |
|---|---|---|
| Upload Lambda timeout | User sees error | Retry upload |
| S3 put fails | No image stored | Lambda retries, fails to user |
| SQS send fails | Job stuck pending | Lambda retries, DLQ if persistent |
| Dispatcher fails | Job stuck analyzing | SQS retry, DLQ |
| Rekognition fails | Partial results | DLQ, manual intervention |
| Custom model fails | Partial results | DLQ, manual intervention |
| DynamoDB fails | State lost | Retry, eventual consistency |
Dead Letter Queues capture persistent failures:
# SQS queue configuration
{
"RedrivePolicy": {
"deadLetterTargetArn": "arn:aws:sqs:...:processing-dlq",
"maxReceiveCount": 3
}
}Message fails 3 times → moves to DLQ. Operations team investigates.
Partial completion is the hard case:
Rekognition succeeds, custom model fails. Job has one result but not both. Options:

Messages that consistently fail don’t block the queue.
DLQ preserves evidence for debugging.
Each Lambda writes structured logs:
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def log_event(event_type, job_id, **kwargs):
logger.info(json.dumps({
'event': event_type,
'job_id': job_id,
'timestamp': datetime.utcnow().isoformat(),
**kwargs
}))
# Usage in handlers
log_event('upload_received', job_id, size_bytes=len(image_data))
log_event('rekognition_complete', job_id, label_count=len(labels))
log_event('finalization_claimed', job_id, claimed_by='rekognition')job_id is the correlation key:
All logs for one upload share the same job_id. Query CloudWatch Logs:
Key metrics to track:

One job_id links all events.
Async gap visible in timestamps.
Why these choices?
API Gateway + Lambda for upload:
S3 for image storage:
SQS for decoupling:
Lambda for analysis:
DynamoDB for coordination:
SNS for notification:
What’s NOT in this architecture:
Each excluded service could be appropriate for different requirements. This architecture optimizes for:
Same structure, different domains:
Document processing pipeline:
Video analysis pipeline:
The pattern generalizes:
Coordination complexity scales with branches:

Pattern scales horizontally.
Coordination strategy must match branch count.