
EE 547 - Unit 5
Fall 2025
Structural and Scale Constraints
Flight booking schema works because:
-- Natural queries for uniform data
SELECT * FROM bookings
WHERE passenger_id = 1234;
SELECT f.* FROM flights f
WHERE departure_airport = 'LAX'
AND arrival_airport = 'JFK';When data and queries match structure, relational wins

Add special services to passengers:
Relational approach 1: Add columns
ALTER TABLE passengers ADD COLUMN wheelchair BOOLEAN;
ALTER TABLE passengers ADD COLUMN meal_pref VARCHAR(50);
ALTER TABLE passengers ADD COLUMN pet_carrier BOOLEAN;
-- ...12 more columnsProblem: Most passengers have mostly NULLs (90% of cells)
Relational approach 2: Services table
Problem: Loses type safety, querying awkward
-- Find passengers needing wheelchair
SELECT DISTINCT passenger_id
FROM passenger_services
WHERE service_key = 'wheelchair'
AND service_value = 'true';
-- Complex: wheelchair AND vegetarian
SELECT ps1.passenger_id
FROM passenger_services ps1
JOIN passenger_services ps2
ON ps1.passenger_id = ps2.passenger_id
WHERE ps1.service_key = 'wheelchair'
AND ps2.service_key = 'meal_pref'
AND ps2.service_value = 'vegetarian';Neither approach feels natural — data has variable structure per entity
Approach 1 storage example:
| passenger_id | name | wheelchair | meal | pet | oxygen | … |
|---|---|---|---|---|---|---|
| 1 | Alice | NULL | NULL | NULL | NULL | … |
| 2 | Bob | NULL | vegan | NULL | NULL | … |
| 3 | Carol | true | NULL | NULL | NULL | … |
With 10,000 passengers: 10,000 rows × 15 service columns = 150,000 cells, ~135,000 NULL values
E-commerce products:
Books:
Electronics:
Clothing:
Relational solution?
IoT sensor data:
Temperature sensors:
Motion sensors:
Camera sensors:
Pattern: Entity type is uniform (product, sensor), but attributes vary within type
Relational model assumes: All entities of same type have same attributes
Reality: Many domains have heterogeneous attributes
Application code thinks in objects:
# Natural object access
booking.passenger.frequent_flyer_number
booking.flight.departure_airport
booking.flight.aircraft.modelSQL requires JOIN to reassemble:
SELECT b.*, p.*, f.*, a.*
FROM bookings b
JOIN passengers p ON b.passenger_id = p.passenger_id
JOIN flights f ON b.flight_id = f.flight_id
JOIN aircraft a ON f.aircraft_id = a.aircraft_id
WHERE b.booking_id = 12345;Computational cost:
Worse: Comments on posts on forums

Impedance mismatch: Object hierarchy vs tabular relations
Every nested access = JOIN operation
Impedance mismatch: Application code operates on object graphs, databases store flat tables. Moving data between these representations requires expensive translation.
Loading a passenger with 3 bookings and flight details:
# ORM loading pattern (N+1 query problem)
passenger = session.query(Passenger).get(123) # 1 query
for booking in passenger.bookings: # 1 query
flight = booking.flight # 3 queries
aircraft = flight.aircraft # 3 queries
# Total: 8 queriesEager loading with JOINs:
SELECT p.*, b.*, f.*, a.*
FROM passengers p
LEFT JOIN bookings b ON p.passenger_id = b.passenger_id
LEFT JOIN flights f ON b.flight_id = f.flight_id
LEFT JOIN aircraft a ON f.aircraft_id = a.aircraft_id
WHERE p.passenger_id = 123;Result: Duplicated passenger data × 3 rows (one per booking)
Alternative: Serialize entire object as JSON
Problem: Lose query capability, indexing, constraints

Load time for 1 passenger + 3 bookings:
Each approach has serious drawbacks
Relational optimized for:
Problematic patterns:
1. Path queries:
-- Flights connecting through exactly 2 hubs
SELECT f1.flight_number, f2.flight_number, f3.flight_number
FROM flights f1
JOIN flights f2 ON f1.arrival_airport = f2.departure_airport
JOIN flights f3 ON f2.arrival_airport = f3.departure_airport
WHERE f1.departure_airport = 'LAX'
AND f3.arrival_airport = 'LHR'
AND f1.arrival_airport NOT IN ('LAX', 'LHR')
AND f2.arrival_airport NOT IN ('LAX', 'LHR');Self-JOINs explode with path length
2. Sparse attribute filters:
-- Passengers with (gluten-free OR wheelchair) AND pet
SELECT DISTINCT ps1.passenger_id
FROM passenger_services ps1
WHERE ps1.passenger_id IN (
SELECT passenger_id FROM passenger_services
WHERE (service_key = 'meal' AND service_value = 'gluten-free')
OR (service_key = 'wheelchair' AND service_value = 'true')
)
AND ps1.passenger_id IN (
SELECT passenger_id FROM passenger_services
WHERE service_key = 'pet_carrier' AND service_value = 'true'
);PostgreSQL JSONB, MySQL JSON type:
ALTER TABLE passengers ADD COLUMN services JSONB;
-- Store variable data as JSON
INSERT INTO passengers (id, name, email, services)
VALUES (1, 'Alice', 'alice@example.com',
'{"wheelchair": true, "meal": "vegetarian"}');
INSERT INTO passengers (2, 'Bob', 'bob@example.com',
'{"meal": "gluten-free", "pet_carrier": true, "extra_legroom": true}');Querying JSON data:
-- PostgreSQL JSONB operators
SELECT * FROM passengers
WHERE services->>'wheelchair' = 'true';
SELECT * FROM passengers
WHERE services ? 'meal'
AND services->>'meal' = 'vegetarian';Problems:
CREATE INDEX idx_wheelchair
ON passengers ((services->>'wheelchair'))
WHERE services->>'wheelchair' IS NOT NULL;Performance comparison:

Result: Relational overhead without relational benefits for nested data
Better solutions exist when data is fundamentally non-tabular
Add event tracking to flights:
ALTER TABLE flights ADD COLUMN delay_reason VARCHAR(100);
ALTER TABLE flights ADD COLUMN delay_minutes INT;
ALTER TABLE flights ADD COLUMN diverted_airport CHAR(3);
ALTER TABLE flights ADD COLUMN emergency_landing BOOLEAN;
ALTER TABLE flights ADD COLUMN diversion_reason TEXT;
ALTER TABLE flights ADD COLUMN emergency_type VARCHAR(50);
ALTER TABLE flights ADD COLUMN emergency_services BOOLEAN;
ALTER TABLE flights ADD COLUMN passenger_medical BOOLEAN;
-- ...15 more event-specific columnsMost flights: NULL for all special event fields
| flight_id | delay_reason | diverted | emergency | … |
|---|---|---|---|---|
| 1001 | NULL | NULL | NULL | … |
| 1002 | NULL | NULL | NULL | … |
| 1003 | weather | NULL | NULL | … |
| 1004 | NULL | NULL | NULL | … |
With 1M flights, 20 event columns:
Semantic ambiguity:
Querying sparse data is awkward:
-- Find flights with delays NOT due to weather
SELECT * FROM flights
WHERE delay_reason IS NOT NULL
AND delay_reason != 'weather';
-- Flights with ANY special event
SELECT * FROM flights
WHERE delay_reason IS NOT NULL
OR diverted_airport IS NOT NULL
OR emergency_landing IS NOT NULL
OR passenger_medical IS NOT NULL
-- ...15 more conditionsSparse attributes common in:
Need to add column to flights table with 10M rows:
PostgreSQL behavior:
Timing for different table sizes:
| Rows | ALTER TABLE time | Downtime |
|---|---|---|
| 10K | ~1 second | Acceptable |
| 100K | ~10 seconds | Problematic |
| 1M | ~30 seconds | Unacceptable |
| 10M | ~5 minutes | Critical |

Zero-downtime migration requires shadow table (complexity: high, time: hours)
Application-level workaround (complex):
Error-prone:
Flight operations data FITS relational well:
Uniform schema:
Clear relationships:
Transaction requirements:
Known queries:
-- These queries are natural and efficient
SELECT COUNT(*) FROM bookings WHERE flight_id = 1234;
SELECT f.*, COUNT(b.booking_id) as passenger_count
FROM flights f
LEFT JOIN bookings b ON f.flight_id = b.flight_id
GROUP BY f.flight_id;
BEGIN TRANSACTION;
INSERT INTO bookings (passenger_id, flight_id, seat)
VALUES (123, 456, '12A');
UPDATE flights SET available_seats = available_seats - 1
WHERE flight_id = 456;
COMMIT;If your data looks like this, relational is the correct choice
When relational is correct:
When to consider alternatives:
Match data structure to model — not every problem needs NoSQL
Relational modeling:
NoSQL modeling:
Example: Flight booking confirmation email
Relational approach: JOIN 5 tables
SELECT b.booking_reference, p.name, p.email,
f.flight_number, f.scheduled_departure,
d.name AS departure_city,
a.name AS arrival_city,
ac.model AS aircraft_type
FROM bookings b
JOIN passengers p ON b.passenger_id = p.passenger_id
JOIN flights f ON b.flight_id = f.flight_id
JOIN airports d ON f.departure_airport = d.airport_code
JOIN airports a ON f.arrival_airport = a.airport_code
JOIN aircraft ac ON f.aircraft_id = ac.aircraft_id
WHERE b.booking_id = 12345;NoSQL approach: Store complete confirmation in booking document

Update cost: User name changes → update in 1000 places vs 1 place
When acceptable: Reads >> Writes (confirmation viewed 100× more than user changes name)
Requirement: Airport display showing flight status
Query: “All departures from LAX in next 2 hours”
Data needed:
Relational approach:
SELECT f.flight_number, f.gate, f.status,
f.scheduled_departure,
a.model AS aircraft_type,
ap.name AS destination_city
FROM flights f
JOIN aircraft a ON f.aircraft_id = a.aircraft_id
JOIN airports ap ON f.arrival_airport = ap.airport_code
WHERE f.departure_airport = 'LAX'
AND f.scheduled_departure BETWEEN NOW()
AND NOW() + INTERVAL '2 hours';Frequency analysis:
NoSQL approach: Denormalize for display query
{
"flight_id": 1234,
"flight_number": "UA123",
"departure_airport": "LAX",
"arrival_airport": "JFK",
"destination_city": "New York", // Duplicated
"aircraft_model": "Boeing 737", // Duplicated
"gate": "B12",
"status": "On Time",
"scheduled_departure": "2025-02-15T14:30:00Z"
}Single query, no joins:
db.flights.find({
departure_airport: "LAX",
scheduled_departure: {
$gte: new Date(),
$lte: new Date(Date.now() + 2*60*60*1000)
}
})Read:Write ratio 2400:1 justifies denormalization
Aircraft model duplicated in every flight using that aircraft
City name duplicated in every flight to that destination
Updates rare, reads constant
Earlier Problem: Variable passenger services
Document approach: Embed services as nested object
{
"passenger_id": 123,
"name": "Alice Chen",
"email": "alice@example.com",
"phone": "+1-555-0123",
"services": {
"wheelchair": true,
"meal": "vegetarian",
"pet_carrier": true,
"seat_preference": "aisle",
"medical_oxygen": false
},
"frequent_flyer": {
"number": "FF123456",
"tier": "Gold",
"miles": 85000
}
}Why embed:
Single document fetch gets everything
No NULL columns for passengers without services

Store only what exists, no wasted NULL storage
Flight with 200 bookings - cannot embed all:
Size problem:
Access problem:
Update problem:
Solution: Reference pattern
// Flight document (small, frequently accessed)
{
"flight_id": 456,
"flight_number": "UA123",
"departure_airport": "LAX",
"arrival_airport": "JFK",
"scheduled_departure": "2025-02-15T14:30:00Z",
"seats_available": 12,
"total_seats": 180
}
// Booking document (separate collection)
{
"booking_id": 789,
"flight_id": 456, // Reference to flight
"passenger_id": 123,
"seat": "12A",
"booking_time": "2025-01-15T10:30:00Z"
}Query patterns:
Get flight details:
Get bookings for flight:
Add new booking:
// Two operations (in transaction if supported)
db.bookings.insert({...})
db.flights.update(
{flight_id: 456},
{$inc: {seats_available: -1}}
)Trade-off:
Like foreign keys, but application-enforced (not database engine!)
Scenario: IoT temperature sensor network
Data volume:
Cannot fit on single server:
Read patterns:
Distribution required: 13TB cannot fit on single server
Partition strategy determines query efficiency:
Partition strategy: All data for sensor_id on same server
Physical layout across 5 servers:
Each server stores:
sensor_id | timestamp | temperature
----------|--------------------|-----------
40000 | 2025-01-15 00:00 | 21.0
40000 | 2025-01-15 00:01 | 21.1
40000 | 2025-01-15 00:02 | 21.2
...
40001 | 2025-01-15 00:00 | 19.5
Query 1: “Sensor 42000, last 24 hours”
Query 2: “Average temperature all sensors”

Performance:
Partition key determines query efficiency
Query requirement: Readings with metadata
Relational approach (bad for distributed):
-- Metadata on central server
CREATE TABLE sensors (
sensor_id INT PRIMARY KEY,
location VARCHAR(100),
building VARCHAR(50),
sensor_type VARCHAR(20)
);
-- Readings partitioned across 5 servers
CREATE TABLE readings (
sensor_id INT,
timestamp TIMESTAMP,
temperature FLOAT,
FOREIGN KEY (sensor_id) REFERENCES sensors
);Query needs metadata:
Denormalized approach:
Storage trade-off:
Query performance:

Query performance:
Update frequency:
Denormalization justified by read:write ratio
Data on Server 3 (sensor_id 40,000-59,999):
Physical layout on disk:
Partition: sensor_id=40000
--------------------------
timestamp | temp
2025-01-15 00:00:00 | 21.0
2025-01-15 00:01:00 | 21.1
2025-01-15 00:02:00 | 21.2
...
2025-01-15 23:59:00 | 20.8
Partition: sensor_id=40001
--------------------------
timestamp | temp
2025-01-15 00:00:00 | 19.5
2025-01-15 00:01:00 | 19.6
...
Query: “sensor_id=40000 between 10:00 and 12:00”
Efficient: O(log n) + O(k) where k = result size
Cannot efficiently query: “All sensors at 10:00”

Sort order within partition provides O(log n) range queries
Two different access patterns:
Pattern 1: Recent readings by sensor
Pattern 2: All sensors at specific time
Same data, stored twice:
-- Table 1: Optimized for sensor queries
CREATE TABLE readings_by_sensor (
sensor_id INT, -- Partition key
timestamp TIMESTAMP, -- Sort key
temperature FLOAT,
location TEXT,
PRIMARY KEY (sensor_id, timestamp)
);
-- Table 2: Optimized for time queries
CREATE TABLE readings_by_time (
time_bucket INT, -- Partition key (hour)
sensor_id INT, -- Sort key
temperature FLOAT,
location TEXT,
PRIMARY KEY (time_bucket, sensor_id)
);Write path: Application writes to both tables
Read path: Query router chooses table based on pattern

Cannot optimize single table for both patterns
Trade storage and write complexity for query performance
Anti-pattern: Normalize sensor data
// Sensors collection (metadata)
{
"sensor_id": 42000,
"location": "Building A, Floor 3",
"sensor_type": "indoor_temp",
"calibration_date": "2024-01-15"
}
// Readings collection (data)
{
"reading_id": 999999,
"sensor_id": 42000, // Reference
"timestamp": "2025-01-15T10:30:00Z",
"temperature": 22.5
}Query to display dashboard (100 recent readings):
Problem:

Correct approach: Denormalize
Relational normalization creates N+1 query problem
Scenario: Sensor relocated (Building A → Building B)
With denormalization, location stored in:
Update options:
Option 1: Update all historical records
Option 2: Update going forward only
Option 3: Versioned metadata
{
"sensor_id": 42000,
"timestamp": "2025-01-15T10:30:00Z",
"temperature": 22.5,
"metadata_version": 3 // Reference version
}More complex queries, additional lookup
When denormalization acceptable:
When to avoid:

Trade-off: Read performance vs update complexity
PostgreSQL practical limits on single node:
Real production at scale:
Netflix streaming service:
Reddit (2023):
Single-machine limits: Cannot fit internet-scale workloads on one node
Single node limits:
| Resource | Limit | Bottleneck |
|---|---|---|
| Connections | 10K | Memory overhead |
| Queries/sec | 100K | CPU cores |
| Storage | 2TB | Vacuum time |
| Write throughput | 50K/sec | Disk I/O |
Capacity mismatch:
Example workload (social media):
Single PostgreSQL node: 100K queries/sec
Gap: 250× capacity needed
Cannot be solved with bigger hardware — must distribute across multiple nodes
Vertical scaling = bigger machine:
AWS RDS pricing (2024):
Capacity scaling:
Problems with vertical:
At Reddit’s 14TB:
Horizontal scaling = more machines:
10 nodes × 128GB RAM:
Comparison for 1TB capacity:
| Approach | Configuration | Cost | Redundancy |
|---|---|---|---|
| Vertical | 1 × 1TB | $11,040 | None |
| Horizontal | 10 × 128GB | $13,800 | 9 nodes survive failure |
Horizontal advantages:
Trade-off: Horizontal requires distributing data and coordinating across nodes
Partition = subset of data assigned to one node
Example: 10M users, 10 nodes
Two partitioning strategies:
Hash-based partitioning:
node = hash(user_id) % num_nodes
hash("alice") = 0x7a3f → Node 7
hash("bob") = 0x2c91 → Node 2
hash("carol") = 0x9e5d → Node 9Characteristics:
Range-based partitioning:
A-B → Node 1
C-E → Node 2
F-J → Node 3
...
Z → Node 10
"alice" → Node 1
"bob" → Node 1
"carol" → Node 2
Characteristics:

Trade-off: Even load vs efficient range queries
Partition key = attribute used to determine node assignment
Example: E-commerce orders table
Option 1: Partition by user_id
-- Efficient: Single node
SELECT * FROM orders WHERE user_id = 123;
→ hash(123) = Node 4 → Query Node 4 only
-- Expensive: All nodes
SELECT * FROM orders WHERE order_date = '2024-10-07';
→ Must query all 10 nodes, merge resultsOption 2: Partition by order_date
-- Efficient: Single node
SELECT * FROM orders WHERE order_date = '2024-10-07';
→ '2024-10-07' in range for Node 3 → Query Node 3 only
-- Expensive: All nodes
SELECT * FROM orders WHERE user_id = 123;
→ Date unknown, must query all 10 nodesPartition key choice optimizes one access pattern at expense of others
Cost of wrong choice:

Partition key choice determines which queries are efficient
Replication = store multiple copies of data on different nodes
Why replicate:
1. Fault tolerance:
2. Read scaling:
3. Geographic distribution:
Standard configuration: N=3 replicas
Example: Partition A (users 0-999,999)
Storage cost: 3× disk space

3 replicas = survive 2 simultaneous failures
Write must propagate from primary to replicas
Write flow:
Network latencies:
Critical question: When does write “succeed”?
Option A: Wait for all replicas (synchronous)
Client → Primary (5ms) → Wait for replicas (50ms)
Total latency: 55ms
If replica offline: Write fails
Result: All nodes have same data (consistent)
Option B: Return immediately (asynchronous)
Client → Primary (5ms) → Return success
Replicas updated in background (async)
Total latency: 5ms
If replica offline: Write still succeeds
Result: Replicas temporarily stale (inconsistent)
Trade-off: 5ms vs 55ms response time

Consistency vs latency vs availability trade-off
Network partition = nodes cannot communicate due to network failure
Common causes:
Example: 3-node cluster [A, B, C]
Normal operation:
Network partition occurs:
Critical problem: Neither side knows if other side crashed or network failed
Duration:
During partition:

Partition forces choice: reject writes or accept conflicts
Three properties distributed systems want:
Consistency (C):
Availability (A):
Partition tolerance (P):
CAP Theorem: Cannot have all three during network partition
During partition, must choose:
CP (Consistency + Partition tolerance):
AP (Availability + Partition tolerance):

Real systems:
CP behavior during partition: [Node A] | [Nodes B, C]
Minority partition (Node A):
Majority partition (Nodes B, C):
After partition heals:
Write quorum: W > N/2
PostgreSQL example:
Use CP when:

Trade-off: Latency and availability for guaranteed consistency
AP behavior during partition: [Node A] | [Nodes B, C]
Both partitions accept writes:
Example conflict:
Node A: user.balance = 100 (withdraw $50)
Node B: user.balance = 120 (deposit $20)
Initial: user.balance = 150
After partition heals: Which is correct?
Conflict resolution strategies:
1. Last-write-wins (timestamp):
2. Application merge:
3. Vector clocks:
Cassandra example:

Trade-off: Consistency for availability and low latency
ACID (relational databases):
BASE (distributed systems):
Basically Available:
Soft State:
Eventual Consistency:

Practical implications:
BASE trades immediate consistency for availability and partition tolerance (AP in CAP)
Modern systems allow per-query consistency level
Cassandra consistency levels:
ONE: Return after 1 replica responds
QUORUM: Return after majority responds (R+W > N)
ALL: Return after all replicas respond
DynamoDB read options:
Application chooses per-query:
Consistency vs latency trade-off:
| Level | Replicas | Latency | Consistency | Availability |
|---|---|---|---|---|
| ONE | 1 | 5ms | Eventual | High |
| QUORUM | 2/3 | 20ms | Strong* | Medium |
| ALL | 3/3 | 50ms | Strong | Low |
*If write also uses QUORUM: R+W > N guarantees overlap
Quorum mathematics:
Latency distribution (Cassandra, same datacenter):
Application decides trade-off:
Different consistency levels optimize different use cases within same system
Fundamental abstraction:
Value is opaque blob:
Examples: Redis, Memcached, Riak, DynamoDB (key-value mode)
Why minimal structure:
Contrast with relational:

Simple interface provides O(1) access with minimal overhead
Core operations:
# Basic
redis.set("key", "value")
redis.get("key")
redis.delete("key")
# Conditional update
redis.setnx("key", "value") # SET if Not eXists
redis.set("key", "new", xx=True) # SET if exists
# Atomic counter
redis.incr("counter") # Atomic increment
redis.decr("counter") # Atomic decrement
# Time to live
redis.setex("session:abc", 1800, data) # 30 min TTL
redis.expire("key", 3600) # Set TTL on existing keyAdvanced (Redis):
# Sorted sets for ranking
redis.zadd("leaderboard", {"player1": 1000})
rank = redis.zrank("leaderboard", "player1")
# Range queries on sorted keys
keys = redis.keys("session:*") # Blocks server during scan
cursor, keys = redis.scan(0, match="session:*") # Non-blocking iterationCannot do:
Example session showing operations:
#| echo: true
#| eval: false
# Store session with 30-minute expiration
import redis
import json
r = redis.Redis()
session_data = {
"user_id": 123,
"login_time": "2025-01-15T10:30:00Z",
"cart": ["product_456", "product_789"]
}
# Store
r.setex(
"session:a3f9b2c1",
1800, # 30 minutes
json.dumps(session_data)
)
# Retrieve
data = r.get("session:a3f9b2c1")
if data:
session = json.loads(data)
print(f"User {session['user_id']} logged in")
else:
print("Session expired or not found")
# Extend TTL
r.expire("session:a3f9b2c1", 1800)
# After 30 minutes: automatic deletion
# No polling, no cleanup jobs requiredOperations execute in microseconds
Must know exact key for retrieval
Web application session requirements:
Relational approach problems:
CREATE TABLE sessions (
session_id VARCHAR(64) PRIMARY KEY,
user_id INT,
data JSONB,
created_at TIMESTAMP,
last_activity TIMESTAMP
);
-- Every request
SELECT * FROM sessions
WHERE session_id = ?
AND last_activity > NOW() - INTERVAL '30 minutes';
-- Cleanup job (runs every minute)
DELETE FROM sessions
WHERE last_activity < NOW() - INTERVAL '30 minutes';Issues:
Key-value with TTL:

Performance:
Database stores bytes, application chooses format:
JSON - Human readable:
session = {
"user_id": 123,
"cart": ["product_456", "product_789"],
"preferences": {"theme": "dark"},
"login_time": "2025-01-15T10:30:00Z"
}
value = json.dumps(session)
# Size: 147 bytes
# Parse time: ~50 microsecondsMessagePack - Binary efficient:
import msgpack
value = msgpack.packb(session)
# Size: 89 bytes (39% smaller)
# Parse time: ~20 microsecondsProtocol Buffers - Typed schema:
Trade-offs:
JSON:
MessagePack:
Protocol Buffers:

Most applications: JSON for simplicity
High-throughput systems: Binary formats
Product catalog query:
SELECT p.*, c.name AS category_name
FROM products p
JOIN categories c ON p.category_id = c.category_id
WHERE p.product_id = 456;Execution time: 45ms (JOIN, disk I/O)
Cache layer intercepts:
def get_product(product_id):
cache_key = f"product:{product_id}"
# Check cache
cached = redis.get(cache_key)
if cached:
return json.loads(cached) # 1ms
# Cache miss - query database
product = db.execute("""
SELECT p.*, c.name AS category_name
FROM products p
JOIN categories c ON p.category_id = c.category_id
WHERE p.product_id = ?
""", product_id) # 45ms
# Cache result for 5 minutes
redis.setex(cache_key, 300, json.dumps(product))
return productPerformance calculation:
Database load reduction:

Cache absorbs read load
Database handles writes and cache misses
Single cache server limit:
Naive sharding breaks:
Adding server changes N → all keys remap
Consistent hashing solution:
Example with 4 servers:
# Server positions on ring (hash of server ID)
servers = {
"server1": hash("server1") % (2**32), # 428312345
"server2": hash("server2") % (2**32), # 1827364829
"server3": hash("server3") % (2**32), # 2913746582
"server4": hash("server4") % (2**32), # 3728191028
}
# Key placement
key_hash = hash("product:456") % (2**32) # 945628371
# Finds next server clockwise: server2Adding server5:

Hash slots distribute cache load without global invalidation
API rate limit: 100 requests per minute per user
Wrong approach - Race condition:
# Read current count
count = int(redis.get(f"rate:{user_id}") or 0)
# Check limit
if count >= 100:
return "Rate limit exceeded"
# Increment count
redis.set(f"rate:{user_id}", count + 1)
redis.expire(f"rate:{user_id}", 60)Problem with concurrent requests:
Time Request A Request B
t0 Read count=99 Read count=99
t1 Check: 99 < 100 Check: 99 < 100
t2 Set count=100 Set count=100
t3 Allow request Allow request
Both requests see count=99, both allowed → 101 requests
Correct approach - Atomic operation:
key = f"rate:{user_id}:{current_minute}"
# Atomic increment returns new value
count = redis.incr(key)
# Set TTL on first request
if count == 1:
redis.expire(key, 60)
# Check after increment
if count > 100:
return "Rate limit exceeded"
return "Request allowed"Atomic increment guarantees correct count under concurrency

Single round-trip, no race conditions
Gaming leaderboard requirements:
Rank 1 million players by score
Update scores frequently (every game completion)
Query operations:
Redis sorted set:
import redis
r = redis.Redis()
# Update player score (O(log N))
r.zadd("leaderboard", {"player_12345": 1850})
r.zadd("leaderboard", {"player_67890": 2100})
# Get player rank - 0-indexed, from lowest (O(log N))
rank = r.zrevrank("leaderboard", "player_12345")
# Returns: 43127 (meaning 43,128th place)
# Get top 10 players with scores (O(log N + 10))
top_10 = r.zrevrange("leaderboard", 0, 9, withscores=True)
# Returns: [("player_67890", 2100.0), ...]
# Get players in score range (O(log N + M))
players = r.zrangebyscore("leaderboard", 1000, 2000)
# Get player score (O(1))
score = r.zscore("leaderboard", "player_12345")All operations logarithmic or better with 1M players
Internal structure: Skip list

Specialized data structure for ranking operations
Cannot query on value contents:
All sessions for user:
# Need to track session keys separately
redis.sadd(f"user_sessions:{user_id}", session_id)
# Or scan all keys (expensive)
cursor = 0
sessions = []
while True:
cursor, keys = redis.scan(cursor, match="session:*")
for key in keys:
data = json.loads(redis.get(key))
if data["user_id"] == target_user:
sessions.append(data)
if cursor == 0:
break
# Scans entire keyspace, O(N) operationProducts under $20:
# Impossible without secondary index
# Must maintain separate sorted set
redis.zadd("products_by_price", {product_id: price})
cheap = redis.zrangebyscore("products_by_price", 0, 20)Application must manually maintain indexes
No relationships:
# Get user and their orders - two queries
user = json.loads(redis.get(f"user:{user_id}"))
order_ids = json.loads(redis.get(f"user_orders:{user_id}"))
orders = [json.loads(redis.get(f"order:{oid}")) for oid in order_ids]No JOIN operation
No multi-key transactions (usually):
Redis transactions limited:
# WATCH/MULTI/EXEC for optimistic locking
redis.watch("account:A", "account:B")
balance_a = redis.get("account:A")
balance_b = redis.get("account:B")
pipe = redis.pipeline()
pipe.set("account:A", balance_a - 100)
pipe.set("account:B", balance_b + 100)
result = pipe.execute() # Fails if watched keys modifiedDynamoDB: 25 item transaction limit
When key-value is wrong:
When key-value is right:
Key-value complements relational, does not replace
Document database stores structured objects as atomic units
Recall embed vs reference patterns from earlier. Document databases implement these patterns while understanding document structure.
Key-value limitation:
# Key-value: Opaque blob
redis.set("booking:123", json.dumps(booking_data))
value = redis.get("booking:123")
booking = json.loads(value) # Application interprets
# Cannot query: "Find bookings for alice@example.com"
# Must scan all keys and deserializeDocument database capability:
// Database understands structure
db.bookings.find({
"passenger.email": "alice@example.com"
})
// Direct query on nested fieldDatabase parses JSON structure to support:
Primary implementations: MongoDB (BSON), CouchDB (JSON), DocumentDB

Document databases bridge gap between key-value flexibility and relational query power
Collection: Group of related documents (analogous to relational table)
Document: Single JSON object with _id primary key
No enforced schema: Documents in same collection can have different fields
Booking collection example:
{
"_id": ObjectId("507f1f77bcf86cd799439011"),
"booking_reference": "ABC123",
"passenger_id": 123,
"flight_id": 456,
"seat": "12A",
"booking_time": ISODate("2025-01-15T10:30:00Z"),
"price": 450.00
}MongoDB-specific types:
ObjectId: 12-byte unique identifier (timestamp + random)ISODate: Native datetime type (UTC)NumberLong, NumberDecimal: Typed numeric valuesStorage format: BSON (Binary JSON)

Collection contains documents with varying fields
Relational limitation: Accessing nested data required JOINs
Document structure with nesting:
{
"booking_id": 12345,
"passenger": {
"name": "Alice Chen",
"email": "alice@example.com",
"frequent_flyer": {
"number": "FF123456",
"tier": "Gold",
"miles": 85000
}
},
"flight": {
"number": "UA123",
"departure_airport": "LAX",
"scheduled_departure": ISODate("2025-02-15T14:30:00Z")
}
}Dot notation queries:
// Find bookings for specific email
db.bookings.find({
"passenger.email": "alice@example.com"
})
// Query nested 3 levels deep
db.bookings.find({
"passenger.frequent_flyer.tier": "Gold"
})
// Combine nested and top-level filters
db.bookings.find({
"flight.departure_airport": "LAX",
"flight.scheduled_departure": {
$gte: ISODate("2025-02-01"),
$lt: ISODate("2025-03-01")
}
})Index on nested field:
Query execution with index:
Comparison to relational approach:
-- Requires 2 JOINs for same data
SELECT b.*
FROM bookings b
JOIN passengers p ON b.passenger_id = p.passenger_id
JOIN frequent_flyer ff ON p.passenger_id = ff.passenger_id
WHERE p.email = 'alice@example.com'
AND ff.tier = 'Gold';Performance measurement (1M bookings):
Why faster:
Nested data stored contiguously eliminates JOIN cost
Scenario: Multi-stop flights (LAX → DEN → JFK)
{
"flight_id": 789,
"flight_number": "UA456",
"stops": [
{
"airport": "LAX",
"arrival": null,
"departure": ISODate("2025-02-15T08:00:00Z"),
"gate": "B12"
},
{
"airport": "DEN",
"arrival": ISODate("2025-02-15T10:30:00Z"),
"departure": ISODate("2025-02-15T11:00:00Z"),
"gate": "A5"
},
{
"airport": "JFK",
"arrival": ISODate("2025-02-15T17:00:00Z"),
"departure": null,
"gate": "C8"
}
]
}Query any array element:
// Find all flights stopping at DEN
db.flights.find({
"stops.airport": "DEN"
})
// Matches if ANY element has airport: "DEN"Query specific array position:
Match array element with multiple conditions:
// Find flights with DEN stop, departure after 11:00
db.flights.find({
"stops": {
$elemMatch: {
"airport": "DEN",
"departure": {$gte: ISODate("2025-02-15T11:00:00Z")}
}
}
})Array operators:
$elemMatch: Element meets multiple conditions$all: Array contains all specified elements$size: Array has specific length$: Positional operator for updates
Array queries operate across all elements
Creating indexes on document structure:
// Index on nested field (2 levels deep)
db.bookings.createIndex({"passenger.email": 1})
// Compound index across nesting levels
db.bookings.createIndex({
"flight.departure_airport": 1,
"flight.scheduled_departure": 1
})Multikey indexes on arrays:
// Index on array field
db.flights.createIndex({"stops.airport": 1})
// Flight with 3 stops creates 3 index entries:
// "LAX" → flight_id: 789
// "DEN" → flight_id: 789
// "JFK" → flight_id: 789Query execution:
// With index (fast)
db.bookings.find({"passenger.email": "alice@example.com"})
// Index seek: O(log n), ~2ms for 1M documents
// Without index (slow)
db.bookings.find({"passenger.phone": "+1-555-0123"})
// Collection scan: O(n), ~500ms for 1M documentsIndex size impact:

MongoDB limit: 64 indexes per collection
Indexes work on any document field
Performance (1M documents): Indexed query: 2ms, Collection scan: 500ms
Query: “Average booking price by departure airport for last 30 days”
Relational approach requires JOIN:
SELECT f.departure_airport, AVG(b.price)
FROM bookings b
JOIN flights f ON b.flight_id = f.flight_id
WHERE b.booking_time >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY f.departure_airport
ORDER BY AVG(b.price) DESC
LIMIT 10;MongoDB aggregation pipeline:
db.bookings.aggregate([
// Stage 1: Filter to last 30 days
{
$match: {
booking_time: {
$gte: ISODate("2025-01-15"),
$lt: ISODate("2025-02-15")
}
}
},
// Stage 2: Group by departure airport
{
$group: {
_id: "$flight.departure_airport",
avg_price: {$avg: "$price"},
total_bookings: {$sum: 1},
min_price: {$min: "$price"},
max_price: {$max: "$price"}
}
},
// Stage 3: Sort by average price descending
{
$sort: {avg_price: -1}
},
// Stage 4: Limit to top 10
{
$limit: 10
}
])Problem: Analyze which airports have most stops (connection hubs)
Document with stops array:
Pipeline with $unwind:
db.flights.aggregate([
// Stage 1: Expand stops array
{
$unwind: "$stops"
},
// After $unwind, creates 3 documents:
// {flight_number: "UA456", stops: "LAX"}
// {flight_number: "UA456", stops: "DEN"}
// {flight_number: "UA456", stops: "JFK"}
// Stage 2: Group by airport
{
$group: {
_id: "$stops",
flight_count: {$sum: 1}
}
},
// Stage 3: Sort by frequency
{
$sort: {flight_count: -1}
},
// Stage 4: Top 10 airports
{
$limit: 10
}
])Result:
[
{"_id": "DEN", "flight_count": 2500},
{"_id": "ORD", "flight_count": 2100},
{"_id": "ATL", "flight_count": 1900},
{"_id": "DFW", "flight_count": 1750}
]$unwind document explosion:
Array expansion multiplies document count:
Optimization: filter before unwind
db.flights.aggregate([
{$match: {departure_date: {$gte: ISODate("2025-02-01")}}}, // Filter first
{$unwind: "$stops"},
{$group: {_id: "$stops", flight_count: {$sum: 1}}}
])Pipeline size:
Measured performance:
$unwind before $match creates unnecessary work
Problem: No schema enforcement - errors undetected
emial instead of email"123" instead of 123MongoDB JSON Schema validation:
db.createCollection("bookings", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["booking_reference", "passenger_id",
"flight_id", "price"],
properties: {
booking_reference: {
bsonType: "string",
pattern: "^[A-Z]{3}[0-9]{3}$",
description: "Format: ABC123"
},
passenger_id: {
bsonType: "int",
minimum: 1
},
price: {
bsonType: "decimal",
minimum: 0,
maximum: 10000
},
passenger: {
bsonType: "object",
properties: {
email: {
bsonType: "string",
pattern: "^.+@.+\\..+$"
}
}
}
}
}
},
validationAction: "error" // or "warn"
})Validation behavior:
Valid insert succeeds:
db.bookings.insertOne({
booking_reference: "ABC123",
passenger_id: 123,
flight_id: 456,
price: NumberDecimal("450.00")
})
// SuccessInvalid insert rejected:
db.bookings.insertOne({
booking_reference: "INVALID", // Wrong pattern
passenger_id: "123", // Wrong type
price: -50.00 // Below minimum
})
// Error: Document failed validationValidation options:
validationAction: "error" - Reject invalid documentsvalidationAction: "warn" - Log warning, allow insertvalidationLevel: "moderate" - Validate only new/updated documentsTrade-off:
Validation provides safety without rigid schema
Updating nested fields:
// Update single nested field
db.bookings.updateOne(
{booking_id: 12345},
{$set: {"passenger.email": "alice.new@example.com"}}
)
// Only passenger.email changes, rest untouchedUpdating array elements:
// Update element by position
db.flights.updateOne(
{flight_id: 789},
{$set: {"stops.1.departure": ISODate("2025-02-15T11:30:00Z")}}
)
// stops[1].departure updated
// Update element by condition ($ positional operator)
db.flights.updateOne(
{flight_id: 789, "stops.airport": "DEN"},
{$set: {"stops.$.departure": ISODate("2025-02-15T11:30:00Z")}}
)
// Updates first matching elementUpdate operators:
Atomicity guarantees:
Single document updates are atomic:
// Atomic: Decrement seats and add booking
db.flights.updateOne(
{flight_id: 456, available_seats: {$gt: 0}},
{
$inc: {available_seats: -1},
$push: {recent_bookings: booking_id}
}
)
// Both operations succeed or both failCross-document operations not atomic (without transactions):
// NOT atomic without transaction
db.flights.updateOne({flight_id: 456}, {$inc: {available_seats: -1}})
db.bookings.insertOne({flight_id: 456, ...})
// Second operation can fail, leaving inconsistent stateMongoDB 4.0+: Multi-document transactions
session = db.getMongo().startSession()
session.startTransaction()
try {
session.getDatabase("airline").flights.updateOne(...)
session.getDatabase("airline").bookings.insertOne(...)
session.commitTransaction()
} catch (error) {
session.abortTransaction()
}Performance: Transactions add overhead (~2-3× slower)
Single-document atomicity sufficient for most embedded designs
MongoDB document constraints:
When embedding breaks:
Flight with all bookings embedded:
{
"flight_id": 456,
"flight_number": "UA123",
"departure_airport": "LAX",
"bookings": [
{...}, // 1KB per booking
{...}, // 200 bookings = 200KB
{...} // Still under 16MB
]
}Problems at scale:
When to reference instead of embed:
Reference pattern (normalized approach):
// Flight document (small, stable)
{
"flight_id": 456,
"flight_number": "UA123",
"departure_airport": "LAX",
"arrival_airport": "JFK",
"scheduled_departure": ISODate("2025-02-15T14:30:00Z"),
"seats_available": 12,
"total_seats": 180
}
// Size: 1KB
// Booking documents (separate collection)
{
"booking_id": 789,
"flight_id": 456, // Reference
"passenger_id": 123,
"seat": "12A"
}
// 180 separate documentsQuery patterns:
// Get flight: 1KB fetch
db.flights.findOne({flight_id: 456})
// Get bookings: Separate query
db.bookings.find({flight_id: 456})
// Add booking: Small update
db.bookings.insertOne({...})
db.flights.updateOne(
{flight_id: 456},
{$inc: {seats_available: -1}}
)Reference when size or update patterns break embedding
Single server limitations:
Replica set architecture:
Write path:
// Application writes to primary
db.bookings.insertOne({...})
// Primary writes to oplog (operation log)
// Secondaries tail oplog and apply operations
// Replication asynchronous by defaultWrite concern controls durability:
// Wait for majority acknowledgment (2 of 3 nodes)
db.bookings.insertOne(
{...},
{writeConcern: {w: "majority", wtimeout: 5000}}
)
// Returns when 2 nodes confirm write
// If timeout: write may succeed but client gets errorReplication lag:

Problem: Analytics across referenced data
Using normalized reference pattern:
// Flights collection
{flight_id: 456, departure_airport: "LAX", ...}
// Bookings collection (separate)
{booking_id: 789, flight_id: 456, passenger_id: 123, price: 450}
// Passengers collection (separate)
{passenger_id: 123, email: "alice@example.com", ...}Query: “Total revenue by departure airport”
Relational approach:
SELECT f.departure_airport, SUM(b.price)
FROM flights f
JOIN bookings b ON f.flight_id = b.flight_id
GROUP BY f.departure_airport;
-- Single query, executes in ~50ms**MongoDB \(lookup (LEFT JOIN):** ```javascript db.flights.aggregate([ {\)lookup: { from: “bookings”, localField: “flight_id”, foreignField: “flight_id”, as: “bookings” }}, {\(unwind: "\)bookings”}, {\(group: { _id: "\)departure_airport”, total_revenue: {\(sum: "\)bookings.price”} }} ]) // Executes in ~450ms vs ~50ms embedded
:::
::: {.column width="45%"}
**Performance difference:**
**$lookup implementation:**
- Nested loop join (no hash join optimization)
- For each flight, scan bookings collection
- 10K flights × 100K bookings = 1B comparisons
- No query optimizer reordering
**Workaround: Denormalize**
```javascript
// Store departure_airport in booking
{
booking_id: 789,
flight_id: 456,
departure_airport: "LAX", // Duplicated
price: 450
}
// Now simple aggregation
db.bookings.aggregate([
{$group: {
_id: "$departure_airport",
total_revenue: {$sum: "$price"}
}}
])
// Executes in ~25ms
Cost of denormalization:
Reference patterns sacrifice cross-collection query performance
Problem: Seat inventory management
Relational ACID transaction:
BEGIN TRANSACTION;
-- Check seat available
SELECT available_seats FROM flights
WHERE flight_id = 456 FOR UPDATE;
-- available_seats = 1
-- Book seat
INSERT INTO bookings (flight_id, passenger_id, seat)
VALUES (456, 123, '12A');
-- Decrement inventory
UPDATE flights SET available_seats = available_seats - 1
WHERE flight_id = 456;
COMMIT;Two concurrent transactions:
FOR UPDATE lock prevents raceMongoDB without transactions:
// Check availability (race condition)
flight = db.flights.findOne({flight_id: 456})
if (flight.available_seats > 0) {
// Another booking can execute here
db.bookings.insertOne({flight_id: 456, ...})
db.flights.updateOne(
{flight_id: 456},
{$inc: {available_seats: -1}}
)
}Race condition: Both bookings succeed, seats go negative
Solution: Atomic update with condition
result = db.flights.updateOne(
{flight_id: 456, available_seats: {$gt: 0}},
{
$inc: {available_seats: -1},
$push: {recent_bookings: booking_id}
}
)
if (result.modifiedCount === 1) {
// Success - seat was available and decremented
db.bookings.insertOne({...})
} else {
// Failed - no seats available
}Works for single-document atomicity, but:
Multi-document transactions (MongoDB 4.0+):
session.startTransaction()
db.flights.updateOne({...}, {$inc: ...}, {session})
db.bookings.insertOne({...}, {session})
session.commitTransaction()Performance cost:
Document model assumes independent documents - cross-document consistency expensive
Sparse data problem:
Wide-column approach:
Examples: Apache Cassandra, HBase, Google Bigtable
Flight 12345 (delay event):
row_key: "flight_12345"
standard: flight_number="UA123", departure="LAX", arrival="JFK"
events: delay_reason="weather", delay_minutes=45
Flight 12346 (emergency event):
row_key: "flight_12346"
standard: flight_number="AA456", departure="JFK", arrival="SFO"
events: emergency_type="medical", diverted_airport="DEN"
Different columns per row, no NULL storage

Storage: 75% reduction by eliminating NULL columns
Three-level hierarchy:
1. Row key:
sensor:42000:2025-01-152. Column family:
metadata, readings, events3. Column:
readings:temperature=22.5Physical storage example:
row_key="sensor:42000:2025-01-15T10:00"
metadata:location = "Building A, Floor 3"
metadata:sensor_type = "indoor_temp"
readings:temperature = 22.5
readings:humidity = 45.0
readings:timestamp = "2025-01-15T10:00:00Z"
Different sensor might have:
row_key="sensor:42001:2025-01-15T10:00"
metadata:location = "Building B, Floor 1"
metadata:sensor_type = "motion"
readings:x_accel = 0.05
readings:y_accel = 0.12
readings:z_accel = 9.81

Column families provide structure, columns provide flexibility
Scenario: Server monitoring metrics
Data volume:
Row key design: server_id:timestamp
server_042:2025-01-15T10:30:00
server_042:2025-01-15T10:30:10
server_042:2025-01-15T10:30:20
Lexicographic sort → time-adjacent data stored adjacently
Query: “Server 42, last hour”
Start: server_042:2025-01-15T09:30:00
End: server_042:2025-01-15T10:30:00
Sequential read of 360 consecutive row keys
Column families:
system: cpu_percent=45.2, memory_mb=8192, uptime_sec=345600
disk: read_iops=120, write_iops=80, queue_depth=3
(only when disk active)
network: rx_bytes=1048576, tx_bytes=524288, connections=42

Time-ordered row keys enable efficient range queries
Storage efficiency (1M samples): Relational with NULLs: 1000 MB, Wide-Column sparse: 400 MB (60% reduction)
Peer-to-peer: All nodes equal
Partitioning:
Write path:
Client sends write to any node (coordinator)
Coordinator determines replicas from hash(row_key)
Write sent to N=3 replica nodes
Coordinator waits based on consistency level:
Write durability on each node:
Write arrives
↓
Commit log (sequential append, durable)
↓
Memtable (in-memory, fast)
↓
[Background: Flush memtable → SSTable on disk]
Commit log ensures no data loss if node crashes
Read path:

Hash determines primary node, N consecutive nodes store replicas
Primary key = (partition key, clustering columns)
Partition key: Determines node placement (hash distribution)
Clustering columns: Sort order within partition (range queries)
Table definition:
CREATE TABLE sensor_readings (
sensor_id INT,
timestamp TIMESTAMP,
temperature FLOAT,
humidity FLOAT,
PRIMARY KEY (sensor_id, timestamp)
);sensor_id = partition key (which node)timestamp = clustering column (sort order)Physical layout on Node 3:
Partition: sensor_id=42000
timestamp=2025-01-15T10:00:00, temp=22.5, humidity=45
timestamp=2025-01-15T10:01:00, temp=22.6, humidity=46
timestamp=2025-01-15T10:02:00, temp=22.4, humidity=45
... (sorted by timestamp)
Partition: sensor_id=42001
timestamp=2025-01-15T10:00:00, temp=19.5, humidity=50
...
Efficient query (uses partition key + clustering):
SELECT * FROM sensor_readings
WHERE sensor_id = 42000
AND timestamp >= '2025-01-15T10:00:00'
AND timestamp < '2025-01-15T11:00:00';Single node, sequential read of 60 rows
Cassandra has no JOIN operation
Data for query must exist in single partition
Relational approach (JOINs):
-- users table
CREATE TABLE users (
user_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
-- activities table
CREATE TABLE activities (
activity_id INT PRIMARY KEY,
user_id INT,
activity_type VARCHAR(50),
timestamp TIMESTAMP,
details TEXT
);
-- Query with JOIN
SELECT a.*, u.name
FROM activities a
JOIN users u ON a.user_id = u.user_id
WHERE a.timestamp > NOW() - INTERVAL '24 hours';Cassandra approach (denormalized):
Query:
Single partition, no JOIN
Update cost:

Acceptable when reads >> writes (activity log read 1000×, name changes rarely)
Cannot optimize single table for multiple partition keys
Solution: Duplicate data in tables with different primary keys
Query pattern 1: Posts by user (timeline)
CREATE TABLE posts_by_user (
user_id INT,
post_timestamp TIMESTAMP,
post_id UUID,
content TEXT,
likes INT,
PRIMARY KEY (user_id, post_timestamp)
) WITH CLUSTERING ORDER BY (post_timestamp DESC);
-- Efficient: Single partition
SELECT * FROM posts_by_user
WHERE user_id = 12345
LIMIT 20;Query pattern 2: Posts by tag (search)
CREATE TABLE posts_by_tag (
tag TEXT,
post_timestamp TIMESTAMP,
post_id UUID,
user_id INT, -- Duplicated
content TEXT, -- Duplicated
likes INT, -- Duplicated
PRIMARY KEY (tag, post_timestamp)
) WITH CLUSTERING ORDER BY (post_timestamp DESC);
-- Efficient: Single partition
SELECT * FROM posts_by_tag
WHERE tag = 'nosql'
LIMIT 20;Write path:
Storage: 6× duplication for 5-tag post

Trade: Storage and write complexity for query performance
Time-series data has natural lifecycle
Old data less valuable, must be removed
Manual deletion: Expensive
Scans entire table, expensive on billions of rows
TTL (time to live): Automatic expiration
INSERT INTO sensor_readings
(sensor_id, timestamp, temperature, humidity)
VALUES (42000, '2025-01-15T10:00:00', 22.5, 45)
USING TTL 7776000; -- 90 days in seconds
-- After 90 days: Row automatically deletedImplementation:
Use cases:

Storage calculation:
Social network queries require expensive self-JOINs:
-- Friends-of-friends
SELECT DISTINCT u.name
FROM connections c1
JOIN connections c2 ON c1.friend_id = c2.user_id
JOIN users u ON c2.friend_id = u.id
WHERE c1.user_id = 123;Graph model stores relationships directly:
Example:
Native graph databases store adjacency (pointers between nodes), not foreign keys

Query: Find friends-of-friends who live in Boston
Cypher (Neo4j query language):
MATCH (me:Person {id: 123})-[:KNOWS]->(friend)
-[:KNOWS]->(fof:Person)
WHERE fof.city = 'Boston'
AND fof.id <> 123
RETURN fof.name, fof.cityPattern matching: (node)-[edge]->(node) describes path structure
Relational equivalent:
SELECT DISTINCT u.name, u.city
FROM connections c1
JOIN connections c2 ON c1.friend_id = c2.user_id
JOIN users u ON c2.friend_id = u.id
WHERE c1.user_id = 123
AND u.city = 'Boston'
AND c2.friend_id <> 123;Performance difference:
For node with 100 friends, each friend has 100 friends:

Graph complexity depends on local structure (node degree), not global table size
Query: How many hops separate two people? Find shortest path.
Cypher:
MATCH path = shortestPath(
(a:Person {id: 123})-[:KNOWS*]-(b:Person {id: 789})
)
RETURN length(path)[:KNOWS*] matches 1 or more KNOWS edgesReturn path details:
RETURN [node IN nodes(path) | node.name] AS connection_path
-- Result: ["Alice", "Bob", "Carol", "Dave"]SQL equivalent:
WITH RECURSIVE paths AS (
SELECT user_id, friend_id, 1 AS depth,
ARRAY[user_id, friend_id] AS path
FROM connections
WHERE user_id = 123
UNION ALL
SELECT p.user_id, c.friend_id, p.depth + 1,
p.path || c.friend_id
FROM paths p
JOIN connections c ON p.friend_id = c.user_id
WHERE p.depth < 6
AND NOT c.friend_id = ANY(p.path)
)
SELECT MIN(depth) FROM paths WHERE friend_id = 789;
Use case: LinkedIn shows connection path and degree of separation
Problem: Recommend products based on friend purchases
Query: Products bought by friends-of-friends but not by me or my direct friends
MATCH (me:Person {id: 123})-[:KNOWS*1..2]->(friend)
MATCH (friend)-[:PURCHASED]->(product:Product)
WHERE NOT (me)-[:PURCHASED]->(product)
AND NOT (me)-[:KNOWS]->()-[:PURCHASED]->(product)
RETURN product.name, COUNT(friend) as recommendation_score
ORDER BY recommendation_score DESC
LIMIT 10Query breakdown:
[:KNOWS*1..2] = 1 or 2 hops (friends + friends-of-friends)Performance:
Relational equivalent:

Results:
Relational requires 3 self-JOINs + product JOIN + filtering subqueries
Movie database with heterogeneous relationships:
Schema (property graph model):
Nodes:
Person (name, birth_year)
Movie (title, year, rating)
Genre (name)
Edges:
(Person)-[:ACTED_IN {role: "Batman"}]->(Movie)
(Person)-[:DIRECTED]->(Movie)
(Movie)-[:BELONGS_TO_GENRE]->(Genre)
Query: Actors who worked with directors who directed Sci-Fi movies
MATCH (actor:Person)-[:ACTED_IN]->(m1:Movie)
<-[:DIRECTED]-(director:Person)
MATCH (director)-[:DIRECTED]->(m2:Movie)
-[:BELONGS_TO_GENRE]->(g:Genre {name: 'Sci-Fi'})
WHERE m1 <> m2
RETURN DISTINCT actor.name, director.name, m2.titleQuery breakdown:
Relational equivalent:

Heterogeneous graphs: Multiple node types, multiple edge types with different semantics
Native graph databases store direct pointers between nodes:
Conceptual storage structure:
Node 123 (Alice):
properties: {name: "Alice", city: "Boston"}
outgoing_edges: [ptr→456, ptr→789]
incoming_edges: [ptr→234]
Node 456 (Bob):
properties: {name: "Bob", city: "NYC"}
outgoing_edges: [ptr→123, ptr→999]
incoming_edges: [ptr→123, ptr→555]
Traversal cost:
Relational comparison:
Physical layout:

Graph databases optimize for traversal, relational optimizes for joins
Fast queries (local traversal):
Performance independent of graph size, depends on local structure:
“Friends of person X”: O(degree)
“Friends-of-friends of X”: O(degree²)
“Shortest path between A and B”: O(degree × hops)
Slow queries (global aggregation):
Must scan entire graph:
Performance example (1M node graph):
When to use graph databases:

Graph databases excel at local traversal, not global aggregation
Scenario: Credit card fraud detection - identify accounts sharing resources
Data model:
Nodes:
Account (account_id, status)
Device (device_fingerprint)
IPAddress (ip)
PhysicalAddress (street, city)
Edges:
(Account)-[:USED_DEVICE]->(Device)
(Account)-[:ORIGINATED_FROM]->(IPAddress)
(Account)-[:REGISTERED_AT]->(PhysicalAddress)
Fraud detection query:
Find accounts connected to flagged account through shared resources (2 hops):
MATCH (suspicious:Account {flagged: true})
MATCH (suspicious)-[:USED_DEVICE|ORIGINATED_FROM
|REGISTERED_AT*1..2]-(related:Account)
WHERE related.account_id <> suspicious.account_id
RETURN related.account_id,
COUNT(*) as connection_strength
ORDER BY connection_strength DESCPattern detected:
Timing:

Relational approach:
Inefficient without indexes:
Global aggregations require full graph scan:
// Count all users - O(n)
MATCH (u:User) RETURN COUNT(u)
// Average friend count - O(n + m)
MATCH (u:User)-[:KNOWS]->(f)
RETURN AVG(COUNT(f))
// Find by property without index - O(n)
MATCH (u:User {email: 'alice@example.com'})
RETURN uWithout property index: scans all User nodes
Large traversal results:
Small world problem - 6 degrees reaches most of graph:
// All people within 6 hops - returns millions
MATCH (me:User {id: 123})-[:KNOWS*1..6]-(connected)
RETURN COUNT(DISTINCT connected)Kevin Bacon number: 6 hops connects to entire Hollywood
Limited transactions:
Write hotspots:
Celebrity node with 10M followers:

Graph databases optimize for relationship traversal, not aggregation
Traditional architecture: Single PostgreSQL database
Handles everything:
Works at small scale (thousands of users)
Observed failure at 100K users:
Not a scale problem - a workload mismatch problem:
PostgreSQL optimized for transactions, not for all four patterns simultaneously

Query patterns interfere when forced through single storage model
Production system: 500K users, 50K orders/day
User completes checkout:
PostgreSQL (ACID transaction):
BEGIN;
INSERT INTO orders (order_id, user_id, total, timestamp)
VALUES (12345, 789, 99.99, NOW());
INSERT INTO order_items (order_id, product_id, quantity, price)
VALUES (12345, 101, 2, 49.99);
UPDATE products SET inventory = inventory - 2
WHERE product_id = 101 AND inventory >= 2;
-- If inventory check fails, entire transaction rolls back
COMMIT;Latency: 50ms | Consistency: Immediate (ACID)
Redis (ephemeral data):
Latency: 1ms | Consistency: Immediate (single key)
Neo4j (graph relationship - async):
// Seconds later, from background worker
MATCH (u:User {id: 789}), (p:Product {id: 101})
CREATE (u)-[:PURCHASED {timestamp: 1234567890}]->(p)Latency: 20ms | Consistency: Eventual (5 second delay acceptable)

Application routes operations to appropriate database
Problem: Product pages receive 5K views/minute
Each view queries PostgreSQL:
SELECT p.name, p.price, p.inventory,
c.name as category,
array_agg(i.url) as images
FROM products p
JOIN categories c ON p.category_id = c.id
JOIN product_images i ON p.id = i.product_id
WHERE p.id = 12345
GROUP BY p.id, c.name;Load: 5K queries/minute = 83 queries/second to PostgreSQL
Product data changes rarely (once per day)
Read-through cache pattern:
def get_product_details(product_id):
cache_key = f"product:{product_id}"
cached = redis.get(cache_key)
if cached:
return json.loads(cached) # 1ms cache hit
# Cache miss - query database
product = postgres.query(query_string, product_id) # 50ms
# Cache for 1 hour
redis.setex(cache_key, 3600, json.dumps(product))
return productMeasured performance:

Analytics queries run on read replica:
Problem: User changes email address
Must update:
No distributed transaction across databases
Approach 1: Dual write
def update_email(user_id, new_email):
# PostgreSQL first (source of truth)
postgres.execute(
"UPDATE users SET email=? WHERE id=?",
new_email, user_id
)
# Neo4j second
try:
neo4j.run(
"MATCH (u:User {id: $uid}) SET u.email = $email",
uid=user_id, email=new_email
)
except Exception as e:
log.error(f"Neo4j update failed: {e}")
retry_queue.push({
"user_id": user_id,
"email": new_email
})Failure scenario:
Approach 2: Event stream
# Application writes to PostgreSQL only
def update_email(user_id, new_email):
postgres.execute("UPDATE users SET email=? WHERE id=?")
kafka.publish("user.updated", {
"user_id": user_id, "email": new_email
})
# Separate consumer updates Neo4j
for event in kafka.consume("user.updated"):
neo4j.run("MATCH (u:User {id: $uid}) SET u.email = $email")
Operational complexity:
Social e-commerce platform: 500K users, 50K orders/day
PostgreSQL (ACID transactions):
Redis (ephemeral cache):
Neo4j (graph traversal):
PostgreSQL read replica (analytics):

Trade-off measured:
Migration pattern observed:
Not about database choice - about workload isolation
DynamoDB: AWS managed database service that follows wide-column data model
Amazon’s fully managed database service:
Production usage:
Trade-off vs Cassandra:

Cassandra: Operator manages cluster. DynamoDB: AWS manages distribution.
Primary key structure determines data distribution and query patterns
Partition key (required):
Sort key (optional):
Composite primary key = (partition key, sort key)
Sensor monitoring table:
{
"TableName": "sensor_readings",
"KeySchema": [
{"AttributeName": "sensor_id", "KeyType": "HASH"},
{"AttributeName": "timestamp", "KeyType": "RANGE"}
],
"AttributeDefinitions": [
{"AttributeName": "sensor_id", "AttributeType": "N"},
{"AttributeName": "timestamp", "AttributeType": "N"}
]
}Items for sensor_id=42000:
{"sensor_id": 42000, "timestamp": 1704034800, "temperature": 22.5}
{"sensor_id": 42000, "timestamp": 1704034810, "temperature": 22.6}
{"sensor_id": 42000, "timestamp": 1704034820, "temperature": 22.4}All stored in same partition, sorted by timestamp

Partition key determines storage location, sort key supports range queries
Items in same table can have different attributes
Wide-column stores only non-NULL values. DynamoDB implements this - no wasted storage on missing attributes.
Flight events with varying attributes:
Delay event:
{
"flight_id": 12345,
"event_time": 1704034800,
"event_type": "delay",
"delay_minutes": 45,
"delay_reason": "weather"
}Emergency event:
{
"flight_id": 12346,
"event_time": 1704038400,
"event_type": "emergency",
"diverted_airport": "DEN",
"emergency_type": "medical"
}Normal departure (no special attributes):
Storage efficiency measurement (1M flight events):
Relational approach: 20 event columns × 1M rows = 20M cells
DynamoDB: Only stores present attributes

Sparse storage eliminates NULL overhead
Query performance depends on partition key specification
Like Cassandra, DynamoDB requires partition key for efficient queries. Without partition key, expensive full-table scan required.
Efficient query (partition key specified):
from boto3.dynamodb.conditions import Key
response = table.query(
KeyConditionExpression=
Key('sensor_id').eq(42000) &
Key('timestamp').between(1704034800, 1704038400)
)Execution:
Inefficient scan (no partition key):
from boto3.dynamodb.conditions import Attr
response = table.scan(
FilterExpression=Attr('temperature').gt(25.0)
)Execution:

Partition key specification: 300× latency difference, 2800× items read
Problem: Table partitioned by sensor_id cannot efficiently query by other attributes
sensor_readings table:
Solution: Global Secondary Index (GSI)
GSI creates alternate partition key on different attribute:
table.update(
GlobalSecondaryIndexUpdates=[{
'Create': {
'IndexName': 'battery-status-index',
'KeySchema': [
{'AttributeName': 'battery_status', 'KeyType': 'HASH'},
{'AttributeName': 'sensor_id', 'KeyType': 'RANGE'}
],
'Projection': {'ProjectionType': 'ALL'}
}
}]
)GSI structure:
Query using GSI:
response = table.query(
IndexName='battery-status-index',
KeyConditionExpression=Key('battery_status').eq('LOW')
)Returns all low-battery sensors in single partition access

GSI maintained automatically, doubles write cost
Query requires partition key, scan does not
Setup:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb', region_name='us-west-2')
table = dynamodb.Table('sensor_readings')Query with partition key and sort key range:
import time
current_time = int(time.time())
one_hour_ago = current_time - 3600
response = table.query(
KeyConditionExpression=
Key('sensor_id').eq(42000) &
Key('timestamp').gt(one_hour_ago)
)
items = response['Items']
# Returns: List of items matching condition
# Example: 360 items (1 per 10 seconds for 1 hour)Performance: Single partition access, 8ms latency
Query with additional filter (applied after retrieval):
response = table.query(
KeyConditionExpression=Key('sensor_id').eq(42000),
FilterExpression=Attr('temperature').gt(25.0)
)Execution:
FilterExpression reduces network transfer, not storage reads or cost
Scan entire table (avoid in production):
Execution on 1M item table:
Pagination for large result sets:
items = []
last_key = None
while True:
if last_key:
response = table.query(
KeyConditionExpression=Key('sensor_id').eq(42000),
ExclusiveStartKey=last_key
)
else:
response = table.query(
KeyConditionExpression=Key('sensor_id').eq(42000)
)
items.extend(response['Items'])
last_key = response.get('LastEvaluatedKey')
if not last_key:
breakDynamoDB returns maximum 1 MB per request. Use LastEvaluatedKey to continue.
Performance comparison (1M item table):
Put item (insert or replace):
table.put_item(
Item={
'sensor_id': 42000,
'timestamp': 1704034800,
'temperature': 22.5,
'humidity': 45.0,
'battery_level': 87
}
)If item with same primary key exists, replaces entirely (not merge).
Get item by primary key:
response = table.get_item(
Key={
'sensor_id': 42000,
'timestamp': 1704034800
}
)
if 'Item' in response:
item = response['Item']
print(f"Temperature: {item['temperature']}")
else:
print("Item not found")get_item returns single item, not list. Consistent read by default (can specify ConsistentRead=False for eventually consistent, half cost).
Update specific attributes:
table.update_item(
Key={
'sensor_id': 42000,
'timestamp': 1704034800
},
UpdateExpression='SET temperature = :temp, humidity = :hum',
ExpressionAttributeValues={
':temp': 23.1,
':hum': 46.5
}
)Updates only specified attributes, leaves others unchanged.
Conditional update (optimistic locking):
table.update_item(
Key={
'sensor_id': 42000,
'timestamp': 1704034800
},
UpdateExpression='SET battery_level = :new_level',
ConditionExpression='battery_level > :threshold',
ExpressionAttributeValues={
':new_level': 85,
':threshold': 80
}
)Update succeeds only if battery_level > 80. Raises ConditionalCheckFailedException if condition false.
Delete item:
Batch operations (up to 25 items):
with table.batch_writer() as batch:
for i in range(100):
batch.put_item(
Item={
'sensor_id': 42000,
'timestamp': 1704034800 + i * 10,
'temperature': 22.0 + i * 0.1
}
)batch_writer automatically handles batching, retries, and backoff. Reduces network round trips for bulk writes.
Cost: Same per-item cost as individual operations. Network latency reduction proportional to batch size.
List tables in region:
Output:
Describe table structure:
Returns:
Get specific item:
aws dynamodb get-item \
--table-name sensor_readings \
--key '{
"sensor_id": {"N": "42000"},
"timestamp": {"N": "1704034800"}
}' \
--region us-west-2Note: CLI requires explicit type tags (N, S, B)
Query with CLI:
aws dynamodb query \
--table-name sensor_readings \
--key-condition-expression "sensor_id = :sid AND #ts > :time" \
--expression-attribute-names '{"#ts": "timestamp"}' \
--expression-attribute-values '{
":sid": {"N": "42000"},
":time": {"N": "1704034800"}
}' \
--region us-west-2expression-attribute-names required when attribute name is reserved word (timestamp).
Output (truncated):
{
"Items": [
{
"sensor_id": {"N": "42000"},
"timestamp": {"N": "1704034810"},
"temperature": {"N": "22.5"},
"humidity": {"N": "45.0"}
},
...
],
"Count": 360,
"ScannedCount": 360
}Scan with filter:
aws dynamodb scan \
--table-name sensor_readings \
--filter-expression "battery_level < :threshold" \
--expression-attribute-values '{":threshold": {"N": "10"}}' \
--region us-west-2Count: Items returned. ScannedCount: Items examined. If filter used, ScannedCount > Count (charged for ScannedCount).
On-demand mode: Pay per request
Pricing (us-west-2):
Capacity unit definition:
Example cost calculation:
When to use on-demand:
Provisioned mode: Reserve capacity
Specify capacity units:
Read Capacity Units (RCU): Strongly consistent reads/sec
Write Capacity Units (WCU): Writes/sec of up to 1 KB
Pricing:
Provisioned capacity example:
Reserve 10 RCU + 5 WCU:
If traffic exceeds provisioned capacity: ProvisionedThroughputExceededException (throttling). Application must retry with exponential backoff.
Auto-scaling:
table.update(
BillingMode='PROVISIONED',
ProvisionedThroughput={
'ReadCapacityUnits': 10,
'WriteCapacityUnits': 5
}
)
# Enable auto-scaling (via AWS Application Auto Scaling)
# Target: 70% utilization
# Min: 5 RCU/WCU
# Max: 100 RCU/WCUAuto-scaling adjusts capacity based on utilization, responds within 2-5 minutes.
Break-even analysis:

Provisioned cheaper for sustained traffic, on-demand cheaper for spiky/low-volume
No server-side aggregations
Cannot compute:
# Invalid - no aggregation support
SELECT AVG(temperature) FROM sensor_readings WHERE sensor_id = 42000Must retrieve all items and compute in application:
response = table.query(
KeyConditionExpression=Key('sensor_id').eq(42000)
)
temperatures = [item['temperature'] for item in response['Items']]
avg_temp = sum(temperatures) / len(temperatures)Cost: Charged for reading all items, even though only computing single aggregate value.
No joins
Cannot combine tables:
# Invalid - no JOIN support
SELECT s.*, m.location
FROM sensor_readings s
JOIN sensor_metadata m ON s.sensor_id = m.sensor_idMust query both tables separately:
# Query 1: Get readings
readings = table_readings.query(
KeyConditionExpression=Key('sensor_id').eq(42000)
)
# Query 2: Get metadata
metadata = table_metadata.get_item(Key={'sensor_id': 42000})
# Application joins results
for reading in readings['Items']:
reading['location'] = metadata['Item']['location']Alternative: Denormalize location into readings table (duplicate data, no update anomalies if location rarely changes).
Query flexibility limited
Must specify partition key:
# Invalid - no partition key
table.query(
KeyConditionExpression=Key('timestamp').gt(1704034800)
)
# Error: Query key condition not supportedWithout partition key, must use scan (reads entire table).
Item size limit: 400 KB
Cannot store:
# Invalid - item exceeds 400 KB
table.put_item(Item={
'sensor_id': 42000,
'timestamp': 1704034800,
'readings': [list of 10,000 measurements] # Too large
})Must split large items or store in S3 with reference in DynamoDB.
Transaction limitations
TransactWriteItems supports up to 25 items:
table.meta.client.transact_write_items(
TransactItems=[
{'Put': {'TableName': 'sensor_readings', 'Item': {...}}},
{'Update': {'TableName': 'sensor_metadata', 'Key': {...}, ...}},
# ... up to 25 operations total
]
)When to use relational database instead:
Requirements:
Table design:
{
"TableName": "sensor_readings",
"KeySchema": [
{"AttributeName": "sensor_id", "KeyType": "HASH"},
{"AttributeName": "timestamp", "KeyType": "RANGE"}
],
"AttributeDefinitions": [
{"AttributeName": "sensor_id", "AttributeType": "N"},
{"AttributeName": "timestamp", "AttributeType": "N"},
{"AttributeName": "temp_bucket", "AttributeType": "S"}
],
"GlobalSecondaryIndexes": [{
"IndexName": "temperature-index",
"KeySchema": [
{"AttributeName": "temp_bucket", "KeyType": "HASH"},
{"AttributeName": "timestamp", "KeyType": "RANGE"}
]
}]
}Rationale:
Query pattern 1 (100/sec): Partition key = sensor_id
Query pattern 2 (1/min): GSI on temperature bucket
Query pattern 3 (1/day): Export to S3, aggregate with Athena
SELECT sensor_id,
DATE(from_unixtime(timestamp)) as date,
AVG(temperature) as avg_temp
FROM sensor_readings_export
GROUP BY sensor_id, DATE(from_unixtime(timestamp))Alternative (wrong) design:
Partition key = date, Sort key = sensor_id + timestamp
Problem: Query pattern 1 requires scanning entire day partition
Design principle: Partition key must match highest-frequency query pattern. GSI handles secondary patterns at write cost increase.
Cost calculation:
Provisioned mode: (200 × $0.47) + (200 × $2.35) = $94 + $470 = $564/month
On-demand mode: 259M reads × $1.25/M + 259M writes × $6.25/M = $324 + $1619 = $1943/month
Provisioned mode 3.4× cheaper for sustained traffic.
Advanced pattern: Store multiple entity types in single table
Wide-column model supports heterogeneous rows. DynamoDB extends this - different entity types in same table using composite key structure.
Multi-table approach (relational thinking):
users_table: PK: user_id
orders_table: PK: order_id
products_table: PK: product_id
order_items_table: PK: order_id, SK: product_idQuery “user profile + recent 10 orders + line items for each order”:
Single-table approach:
application_table:
PK: USER#alice, SK: PROFILE → user data
PK: USER#alice, SK: ORDER#2025-01-15 → order summary
PK: ORDER#12345, SK: METADATA → order details
PK: ORDER#12345, SK: ITEM#widget → line itemQuery “user profile + recent orders”:
Returns user profile + all orders in single request (11 items), 8ms latency, 3 RCU
Trade-off:

Single query retrieves related entities, no GSI overhead
Concrete example: E-commerce order system
Table structure with composite keys encoding entity type and relationships:
User entity:
{
"PK": "USER#alice",
"SK": "PROFILE",
"email": "alice@example.com",
"name": "Alice Chen",
"created": "2024-01-15"
}User’s orders (sorted by date):
{
"PK": "USER#alice",
"SK": "ORDER#2025-01-15#12345",
"order_id": "12345",
"total": 125.50,
"status": "shipped"
}Order metadata (alternate access pattern):
{
"PK": "ORDER#12345",
"SK": "METADATA",
"user_id": "alice",
"order_date": "2025-01-15",
"shipping_address": "123 Main St"
}Order line items:
Access pattern 1: User profile + recent orders
response = table.query(
KeyConditionExpression=
Key('PK').eq('USER#alice') &
Key('SK').begins_with('ORDER#')
)Returns:
Access pattern 2: Order details + line items
Returns:
Performance comparison:
Multi-table design:
Single-table design:
Sort key ordering (ORDER#2025-01-15#12345) provides chronological sorting without additional index. Latest orders retrieved first by reversing sort direction.
Constraint: Cannot efficiently query “all orders across all users” (no partition key). Single-table design optimizes for known access patterns, not ad-hoc queries.
DynamoDB complements relational databases
PostgreSQL RDS (relational):
DynamoDB (wide-column):
S3 (object storage):
Data flow example (IoT sensor platform):

Each storage system handles workload it optimizes for
Social Connection Queries in Relational Models
Schema:
Query: “Friends of friends who live in Boston”
Cost: 2 self-JOINs, O(n²) join complexity
3-hop traversal (friends-of-friends-of-friends)?
Arbitrary depth?
Recursive CTEs:
Complex, performance unpredictable