
EE 547 - Unit 6
Fall 2025
Software engineering fundamental: Separating concerns through interfaces
Within a single application:
# User management module
def create_user(email, password):
user_id = generate_id()
hash_pwd = hash_password(password)
store_user(user_id, email, hash_pwd)
return user_id
# Booking module
def create_booking(user_id, flight_id):
user = get_user(user_id) # Function call
if user.is_active:
return store_booking(user_id, flight_id)Module boundaries provide:
get_user(user_id) → User defines expectationsSingle process limitation:

Function calls couple modules in same process
Moving from modules to separate processes
Same code, different execution model:
# User service (separate process)
# Listens on port 8001
@app.route('/users', methods=['POST'])
def create_user():
email = request.json['email']
password = request.json['password']
user_id = generate_id()
hash_pwd = hash_password(password)
store_user(user_id, email, hash_pwd)
return {'user_id': user_id}
# Booking service (separate process)
# Listens on port 8002
@app.route('/bookings', methods=['POST'])
def create_booking():
user_id = request.json['user_id']
flight_id = request.json['flight_id']
# HTTP request instead of function call
response = requests.get(f'http://localhost:8001/users/{user_id}')
user = response.json()
if user['is_active']:
return store_booking(user_id, flight_id)Why separate processes:

Process boundaries isolate failures
API: Application Programming Interface - contract for communication
Function call contract:
def get_user(user_id: int) -> User:
"""
Contract:
- Input: user_id (integer)
- Output: User object with fields: id, email, is_active
- Raises: UserNotFoundError if user_id doesn't exist
"""
passHTTP API contract for same operation:
Request: GET /users/123 on host user-service:8001
Success response: HTTP 200 OK
Not found response: HTTP 404 Not Found
API contract specifies:
GET /users/123 identifies resource and operationuser_id, email, is_active fields (status 200)error message (status 404)Why explicit contracts matter:
Different teams can work independently:
is_active fieldAPI documentation as contract:
GET /users/:user_id — Retrieve user by ID
Parameters:
user_id (integer, path, required): User identifierResponses:
user_id (integer), email (string), is_active (boolean)error (string), user_id (integer)Contract enforcement:
APIs make implicit function contracts explicit and enforceable
Scenario: User service needs to add email verification
Version 1 response: GET /users/123
Version 2 - Adding fields (backward compatible)
{
"user_id": 123,
"email": "alice@example.com",
"is_active": true,
"email_verified": true, // New field
"verification_date": "2025-01-15" // New field
}Backward compatible change:
is_active as beforeVersion 2 - Breaking change (not compatible)
{
"user_id": 123,
"email": "alice@example.com",
"account_status": "active_verified" // Replaced is_active
}Problem: Booking service still reads is_active field
false or crashesVersion management strategies:
URL-based versioning:
GET /v1/users/123 → Old response (includes is_active)
GET /v2/users/123 → New response (includes account_status)
Booking service continues using /v1/users
New services can use /v2/users
User service maintains both versions temporarily
Version distribution (airline system, 45 days after v2 launch):
Cannot remove v1 until 100% migrated
Why versioning needed:
APIs enable independent deployment through versioning
API serves multiple independent consumers
Four clients calling GET /users/123:
is_active before creating bookinguser['email']https://api.airline.com)All four clients depend on same contract
Client code example:
response = requests.get('http://user-service:8001/users/123')
user = response.json()
if user['is_active']:
create_booking(...)Internal change in user service:
# Original: Users stored in PostgreSQL
def get_user(user_id):
row = db.query("SELECT * FROM users WHERE id = ?", user_id)
return {
'user_id': row['id'],
'email': row['email'],
'is_active': row['active']
}
# New: Users moved to Redis cache (performance improvement)
def get_user(user_id):
cached = redis.get(f'user:{user_id}')
if cached:
return json.loads(cached)
# Fallback to database...Impact on clients: None
Contract violation example:
User service developer changes field name:
# Accidentally changed field name
return {
'user_id': row['id'],
'email_address': row['email'], # Was 'email'
'is_active': row['active']
}Cascading failures:
KeyError: 'email' when sending confirmation4 clients break simultaneously from single field rename
Why contracts matter with multiple clients:
APIs require stability when serving multiple independent clients
HTTP request anatomy:
GET /users/123 HTTP/1.1
Host: user-service.airline.com
Authorization: Bearer eyJhbGc...
Accept: application/json
User-Agent: booking-service/2.1.0Request line components:
GET - what operation to perform/users/123 - which resource to accessHTTP/1.1 - version of HTTPRequest headers (metadata):
Host: Which server to route to (required in HTTP/1.1)Authorization: Credentials for authenticationAccept: What response format client understandsUser-Agent: Identifies client making requestHeaders are key-value pairs: Header-Name: value
Empty line separates headers from body
Requests without body (GET, DELETE) end after headers
Measured request size:

Request sent as plain text over TCP
HTTP response anatomy:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 145
Cache-Control: max-age=300
Date: Mon, 15 Jan 2025 14:30:00 GMT{
"user_id": 123,
"email": "alice@example.com",
"is_active": true,
"created_at": "2024-01-10T08:00:00Z"
}Status line components:
HTTP/1.1200 - numeric result indicatorOK - human-readable descriptionResponse headers:
Content-Type: Format of response body (JSON, HTML, etc)Content-Length: Body size in bytesCache-Control: How long response can be cachedDate: When response was generatedResponse body:
Content-Type headerEmpty line separates headers from body (same as request)

Response mirrors request structure
Status code tells client what happened and what to do next
response = requests.get('http://user-service/users/123')
if response.status_code == 200:
user = response.json() # Success - process data
elif response.status_code == 404:
return None # User doesn't exist - normal case
elif response.status_code == 401:
refresh_token() # Get new auth token
retry_request() # Try again
elif response.status_code == 503:
time.sleep(5) # Service down
retry_request() # Retry with backoff
elif response.status_code >= 500:
alert_ops_team() # Server problem
return fallback_response()Different codes require different handling:
2xx: Process response
4xx: Fix request or handle business logic
5xx: Retry or use fallback
Common status codes in production:
200 OK — Request succeeded
Return data in response body
201 Created — Resource created
Location header has new resource URL
204 No Content — Success, no data
DELETE succeeded, nothing to return
400 Bad Request — Malformed request
Invalid JSON, missing required field
401 Unauthorized — No valid auth
Token expired or missing
403 Forbidden — Not allowed
Valid auth but wrong permissions
404 Not Found — Resource missing
Normal for checking existence
429 Too Many Requests — Rate limited
Check Retry-After header
500 Internal Server Error — Bug
Unhandled exception in server
503 Service Unavailable — Overloaded
Retry with exponential backoff
4xx = Your request has a problem
Response: 400 Bad Request
{
"errors": [
{"field": "email", "message": "Invalid email format"},
{"field": "age", "message": "Must be integer"}
]
}Client must fix the request:
5xx = Server has a problem
# Server code with bug
@app.route('/users/<id>')
def get_user(id):
user = db.query(f"SELECT * FROM users WHERE id = {id}")
return user.to_dict() # Crashes if user is NoneResponse: 500 Internal Server Error
Client should retry (server might recover):

Retry strategies differ:
4xx errors: Don’t retry same request
5xx errors: Retry might work
429 Too Many Requests — You’re sending too fast
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1697299200
Retry-After: 60Client must slow down:
if response.status_code == 429:
retry_after = response.headers.get('Retry-After', 60)
time.sleep(int(retry_after))
# Or queue request for later503 Service Unavailable — Server overloaded
Server is temporarily unable to handle requests:
Different causes, different handling:
429 = Rate limiting (intentional)
503 = Overload (unintentional)
Exponential backoff pattern:
def retry_with_backoff(func, max_retries=5):
for attempt in range(max_retries):
response = func()
if response.status_code == 503:
wait = 2 ** attempt # 1, 2, 4, 8, 16
time.sleep(wait)
else:
return response
raise Exception("Max retries exceeded")Circuit breaker pattern:
HTTP methods specify the operation type
GET — Read data
Returns user 123’s data. No changes to server state.
POST — Create new
Creates new user. Server assigns ID.
PUT — Replace entirely
Replaces ALL fields of user 123.
PATCH — Update partially
Updates ONLY email, leaves other fields unchanged.
DELETE — Remove
Removes user 123 from system.
Critical property: Idempotency
Idempotent = Same result from multiple identical calls
| Method | Idempotent | Safe | Use Case |
|---|---|---|---|
| GET | Yes | Yes | Read data |
| POST | No | No | Create new |
| PUT | Yes | No | Replace all |
| PATCH | No | No | Update some |
| DELETE | Yes | No | Remove |
Why idempotency matters:
Network fails after server processes but before client gets response.
Idempotent (PUT, DELETE):
Not idempotent (POST):
Safe = No server state changes
Only GET is safe (can cache, prefetch)
POST - Server assigns identifier
# Client doesn't know ID yet
POST /users
{
"email": "alice@example.com",
"name": "Alice"
}
# Server response
201 Created
Location: /users/456
{
"id": 456, # Server assigned
"email": "alice@example.com",
"name": "Alice",
"created_at": "2024-01-15T10:30:00Z"
}PUT - Client specifies identifier
# Client knows ID (e.g., using email as ID)
PUT /users/alice@example.com
{
"name": "Alice",
"role": "admin"
}
# Server response
200 OK # Or 201 if newly created
{
"id": "alice@example.com",
"name": "Alice",
"role": "admin"
}POST is not idempotent:
When to use each:
Use POST when:
Use PUT when:
Real examples:
GitHub:
POST /repos/owner/repo/issues
# Creates issue, GitHub assigns number
PUT /repos/owner/repo/contents/README.md
# Creates/replaces file at exact pathAWS S3:
Idempotency in practice:
PUT replaces entire resource
# Current user state
{
"id": 123,
"email": "alice@example.com",
"name": "Alice",
"role": "user",
"is_active": true
}
# PUT request (missing fields)
PUT /users/123
{
"email": "alice@example.com",
"name": "Alice Updated"
}
# Result - other fields lost/defaulted
{
"id": 123,
"email": "alice@example.com",
"name": "Alice Updated",
"role": null, # Lost!
"is_active": false # Lost!
}PATCH updates only specified fields
Common PATCH formats:
JSON Merge Patch (simple):
JSON Patch (RFC 6902):
[
{"op": "replace", "path": "/name", "value": "New Name"},
{"op": "add", "path": "/tags/0", "value": "premium"},
{"op": "remove", "path": "/temp_field"}
]When to use each:
PUT:
PATCH:
Common mistake: Using PUT for single field update loses data
Safe methods can be called without side effects
# Safe to retry, cache, prefetch
GET /users/123
GET /users/123 # Same result
GET /users/123 # Same result
# Browser/proxy can cache
Cache-Control: max-age=300Unsafe methods change server state
# DELETE is idempotent but unsafe
DELETE /users/123 # Returns 204 No Content
DELETE /users/123 # Returns 404 Not Found
DELETE /users/123 # Returns 404 Not Found
# Final state same, but state did change
# POST is neither safe nor idempotent
POST /orders # Creates order 1
POST /orders # Creates order 2 (duplicate!)
POST /orders # Creates order 3 (duplicate!)Network failure handling:

Retry safety:
Always safe: GET
Safe if idempotent: PUT, DELETE
Dangerous: POST, PATCH
Need idempotency keys for POST/PATCH
Creating new booking via POST:
POST /bookings HTTP/1.1
Host: booking-service.airline.com
Content-Type: application/json
Content-Length: 215
Authorization: Bearer eyJhbGc...{
"user_id": 123,
"flight_id": 456,
"seat": "12A",
"payment": {
"method": "credit_card",
"amount": 450.00,
"currency": "USD"
},
"notifications": {
"email": true,
"sms": false
}
}Additional headers for body:
Content-Type: Specifies body format (JSON, XML, form data)Content-Length: Exact size in bytes (required by HTTP/1.1)Server response:
201 Created status indicates:
Location header provides URL to access new resource
POST request includes data in body
HTTP runs over TCP connection:
1. TCP handshake (connection establishment):
Client Server
| |
|--- SYN -------->| (50ms)
|<-- SYN-ACK -----| (50ms)
|--- ACK -------->| (50ms)
| |
[TCP established]2. HTTP request/response over established connection:
3. Connection close:
Total measured latency for single request:
Geographic impact (measurements):

3-way handshake before HTTP request
Problem: Creating new TCP connection for each request is expensive
Without keep-alive (HTTP/1.0 default):
Request 1:
TCP handshake: 150ms
HTTP request/response: 100ms
Close connection
Total: 250ms
Request 2:
TCP handshake: 150ms (again!)
HTTP request/response: 100ms
Close connection
Total: 250ms
Request 3:
TCP handshake: 150ms (again!)
HTTP request/response: 100ms
Close connection
Total: 250ms
Total for 3 requests: 750msWith keep-alive (HTTP/1.1 default):
Request 1:
TCP handshake: 150ms
HTTP request/response: 100ms
Keep connection open
Total: 250ms
Request 2:
HTTP request/response: 100ms
(reuse connection)
Total: 100ms
Request 3:
HTTP request/response: 100ms
(reuse connection)
Total: 100ms
Total for 3 requests: 450ms40% latency reduction by reusing connection
Keep-alive headers:
Request includes: Connection: keep-alive Response includes: Connection: keep-alive and Keep-Alive: timeout=5, max=1000
Keep-alive parameters:
timeout=5: Server keeps connection open for 5 seconds idlemax=1000: Maximum 1000 requests on this connectionConnection pooling in practice:
import requests
# Creates connection pool (default 10 connections)
session = requests.Session()
# All requests reuse connections from pool
for user_id in range(100):
response = session.get(f'http://user-service:8001/users/{user_id}')
# Connections automatically returned to poolMeasured improvement (100 requests):
Connection reuse critical for performance
Problem: Service needs to handle many concurrent requests
Single connection serves requests sequentially:
Connection pool serves requests in parallel:
Connection 1: [Req1]->[Resp1] [Req4]->[Resp4]
Connection 2: [Req2]->[Resp2] [Req5]->[Resp5]
Connection 3: [Req3]->[Resp3]Connection pool implementation:
from urllib3 import PoolManager
# Create pool with size limits
pool = PoolManager(
num_pools=10, # Max 10 different hosts
maxsize=20, # Max 20 connections per host
block=True # Wait if pool exhausted
)
# Connections managed automatically
response = pool.request('GET', 'http://api/users/123')
# Connection returned to pool after response readPool sizing considerations:

Pool exhaustion behavior:
# Pool size: 2, but 3 concurrent requests
pool = PoolManager(maxsize=2)
# Thread 1: Gets connection
# Thread 2: Gets connection
# Thread 3: Blocks waiting for available connection
# Thread 1 completes: Connection returned to pool
# Thread 3: Gets recycled connectionReal scenarios requiring pools:
HTTP headers determine how services process requests
Four critical functions in distributed systems:
1. Authentication/Authorization Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Service validates identity and permissions before processing
2. Content Negotiation Content-Type: application/json; charset=utf-8 Accept: application/json
Ensures correct parsing and response format
3. Request Correlation X-Request-ID: 7f3c6b2a-5d9e-4f8b-a1c3-9e8d7c6b5a4f
Traces requests across multiple services for debugging
4. Service Metadata User-Agent: booking-service/2.1.0 X-API-Version: 2
Enables version-specific handling and deprecation
What happens without proper headers:
Missing Authorization → 401 Unauthorized
Wrong Content-Type → Data corruption
No X-Request-ID → Can’t trace failures
Invalid Accept → Client can’t parse response
Headers every request needs:
Authorization — Identity and permissionsContent-Type — How to parse bodyAccept — What format you want backHeaders for debugging:
X-Request-ID — Correlation across servicesUser-Agent — Which client sent thisHeaders in responses:
X-RateLimit-Remaining — Quota statusCache-Control — Can this be cached?Headers are contracts between services
Problem: Request fails somewhere in chain of services
User reports “booking failed” - what actually happened?
# Three services, thousands of concurrent requests
[14:23:01.123] booking-service: Processing request
[14:23:01.234] user-service: Database query failed
[14:23:01.345] payment-service: Processing payment
[14:23:01.456] booking-service: Request failed
# Which events are related to the user's failure?Solution: Thread request ID through all services
# Generate ID at entry point
@app.before_request
def assign_request_id():
request_id = request.headers.get('X-Request-ID',
str(uuid.uuid4()))
g.request_id = request_id
# Forward to downstream services
headers = {
'X-Request-ID': g.request_id,
'Authorization': get_token()
}
response = requests.get(user_service_url, headers=headers)
# Include in every log message
logger.info(f"[{g.request_id}] Processing user {user_id}")JWT in Authorization header identifies service and permissions
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Decoded JWT contains:
{
"sub": "booking-service", // Who is calling
"scopes": [ // What they can do
"read:users",
"write:bookings"
],
"exp": 1697295600, // When token expires
"iat": 1697292000 // When token was issued
}Server validates every request:
def validate_request(request):
auth_header = request.headers.get('Authorization')
if not auth_header or not auth_header.startswith('Bearer '):
return 401 # No identity provided
token = auth_header[7:] # Remove 'Bearer ' prefix
try:
payload = jwt.decode(token, PUBLIC_KEY, algorithms=['RS256'])
if 'read:users' not in payload.get('scopes', []):
return 403 # Identity valid, permission denied
return None # Success
except jwt.ExpiredSignatureError:
return 401 # Identity expired401 vs 403 - Critical distinction:
401 Unauthorized — Identity problem
403 Forbidden — Permission problem
Token expiration creates problems:
Long-running operation starts with valid token
Token expires during operation
Operation fails partway through
Common patterns:
This is why short expiration times matter
Content-Type tells server how to parse request body
POST /models/123/predict
Content-Type: application/json; charset=utf-8
Accept: application/json
{"features": [1.2, 3.4, 5.6], "threshold": 0.8}Server uses Content-Type to route parsing:
@app.route('/models/<id>/predict', methods=['POST'])
def predict(id):
content_type = request.headers.get('Content-Type', '')
if 'application/json' in content_type:
data = request.get_json() # JSON parser
elif 'application/x-www-form-urlencoded' in content_type:
data = request.form # Form parser
elif 'multipart/form-data' in content_type:
data = request.files # File parser
else:
return {'error': 'Unsupported Content-Type'}, 415
# Check Accept header for response format
accept = request.headers.get('Accept', 'application/json')
if 'application/json' not in accept:
return {'error': 'Cannot produce requested format'}, 406
result = model.predict(data)
return jsonify(result), 200How wrong Content-Type corrupts data:
# Client sends JSON with wrong header
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
body = json.dumps({'key': 'value'})
# Server parses as form, gets garbage
request.form = {'{"key": "value"}': ''}Content-Type controls parsing:
application/json → JSON parser
application/x-www-form-urlencoded → Form parser
multipart/form-data → File upload parser
application/octet-stream → Raw bytes
Why explicit headers matter:
Every request should include:
Version headers enable gradual migration
Server handles multiple versions simultaneously:
api_version = request.headers.get('X-API-Version', '1')
if api_version == '1':
# Old clients expect this format
return {'user': user_id, 'active': True}
elif api_version == '2':
# New clients get additional fields
return {'user_id': user_id, 'is_active': True,
'created_at': timestamp}Signal deprecation to old clients:
if api_version == '1':
response.headers['Sunset'] = 'Sat, 31 Dec 2024 23:59:59 GMT'
response.headers['Deprecation'] = 'version="1"'
# Client knows to migrate before sunset dateTrack usage to know when safe to remove:
Version migration reality:
Week 1: Release v2, most clients still on v1
Week 4: Send migration reminders
Week 8: Add deprecation headers
Week 12: Still have clients on v1
Cannot remove v1 until all clients migrate
Some clients never update:
User-Agent reveals problem clients:
mobile-app/1.0 — High error rate
batch-processor/1.5 — Still on v1
web-app/2.3 — Successfully migrated
Without version headers:
Version headers enable controlled evolution
REST: Representational State Transfer
Architectural style, not a protocol or standard
Coined by Roy Fielding (2000 dissertation) based on HTTP design principles
Core idea: Resources identified by URLs, manipulated via standard HTTP methods
What REST is NOT:
What REST provides:
REST vs other approaches:
/createUser, /getUser, /deleteUser (verbs in URLs)POST /users, GET /users/123, DELETE /users/123 (resources + methods)REST treats everything as a resource accessible via URL

REST uses resource URLs + HTTP methods
REST principle: URLs identify resources (things), methods specify operations
Resource hierarchy in airline API:
User resources:
/users — Collection of all users/users/123 — Specific user/users/123/bookings — User’s bookings (sub-collection)/users/123/bookings/789 — Specific bookingFlight resources:
/flights — Collection of all flights/flights/456 — Specific flight/flights/456/seats — Available seatsAirport resources:
/airports — Collection of airports/airports/LAX — Specific airport/airports/LAX/flights — Flights from LAXURL structure conventions:
/users not /user/frequent-flyers not /frequentFlyers/users/123/bookingsOperations via HTTP methods:
GET /users — Get all usersGET /users/123 — Get specific userPOST /users — Create new user (body: email, password)PUT /users/123 — Update user (body: complete resource)DELETE /users/123 — Delete userNested resources show relationships:
GET /users/123/bookings returns array of user’s bookings:
[
{"booking_id": 789, "flight_id": 456, "seat": "12A", ...},
{"booking_id": 790, "flight_id": 457, "seat": "14B", ...}
]GET /users/123/bookings/789 returns specific booking via user path:
GET /bookings/789 returns same booking via direct path:
Design choice: Provide both paths when resource makes sense independently
/users/123/bookings — User-centric view (all bookings for user)/bookings/789 — Booking-centric view (single booking)Different access patterns for different use cases
GET retrieves resource without modification
Request targets specific resource by ID:
Server returns resource representation:
{
"user_id": 456,
"email": "carol@example.com",
"name": "Carol Chen",
"is_active": true,
"created_at": "2025-01-15T14:30:00Z"
}GET characteristics:
GET on collections returns multiple resources:
DELETE removes resource
Request targets specific resource:
Server removes resource, returns minimal response:
DELETE characteristics:
Subsequent DELETE returns 404:
First delete:
DELETE /users/456 → 204 No Content (deleted)
Second delete:
DELETE /users/456 → 404 Not Found (already gone)
Final state identical: User 456 doesn’t exist
Both methods are idempotent:
Idempotency enables safe retries on network failures
POST creates new resource
Request sent to collection URL:
Server assigns ID and creates resource:
{
"user_id": 456,
"email": "carol@example.com",
"name": "Carol Chen",
"created_at": "2025-01-15T14:30:00Z"
}POST characteristics:
/users not /users/456)Location header contains new resource URLWhy not idempotent:
POST /users {"email": "test@example.com"} → 201 Created, user_id=456
POST /users {"email": "test@example.com"} → 201 Created, user_id=789 (different resource!)
PUT replaces entire resource
Request sent to specific resource URL:
Server replaces resource completely:
{
"user_id": 456,
"email": "carol.new@example.com",
"name": "Carol Chen",
"is_active": false,
"updated_at": "2025-01-20T10:00:00Z"
}PUT characteristics:
/users/456)PUT replaces entirely:
Missing fields in request are removed:
PUT /users/456 {"email": "new@example.com"}
Result: name field removed (entire resource replaced, not email alone)
Use PATCH for partial updates instead
Query parameters modify which resources are returned
Example: GET /flights?departure_airport=LAX
Path /flights identifies collection, departure_airport=LAX filters results
Query parameter syntax:
? in URLkey=value&%20, special characters escapedFiltering examples:
Single filter: GET /flights?departure_airport=LAX → Returns only flights departing from LAX
Multiple filters: GET /flights?departure_airport=LAX&arrival_airport=JFK&date=2025-02-15 → Returns LAX→JFK flights on specific date
Three-way filter: GET /flights?departure_airport=LAX&status=scheduled&aircraft_type=737 → Returns scheduled 737 flights from LAX
All filters are AND conditions - flight must match all criteria
Parameter validation returns 400 Bad Request:
Invalid value:
Server validates parameters before database query
Sorting with parameters:
GET /flights?sort=departure_time — Ascending order (default)GET /flights?sort=-departure_time — Descending order (minus prefix)GET /flights?sort=departure_airport,departure_time — Multiple fields (comma-separated)Last example sorts LAX flights before JFK, then by time within each airport
Combining filters and sorting:
GET /flights?departure_airport=LAX&status=scheduled&sort=-departure_time
Returns scheduled LAX flights, most recent first
Query parameters keep URL structure clean while enabling flexible filtering
Problem: Collection with 2,500 flights too large for single response
Without pagination: GET /flights → Returns 2,500 flights, 4MB response, 8 second load time
With pagination: GET /flights?limit=50&offset=0 → Returns 50 flights, 80KB response, 150ms load time
Offset-based pagination:
limit controls page size, offset controls starting position
GET /flights?limit=50&offset=0GET /flights?limit=50&offset=50GET /flights?limit=50&offset=100Formula: offset = page_number × limit
Pagination metadata in response:
{
"flights": [...50 flight objects...],
"pagination": {
"limit": 50,
"offset": 0,
"total": 2500,
"next": "/flights?limit=50&offset=50",
"prev": null
}
}Response includes links to next/previous pages
Alternative pagination strategies:
Cursor-based (for frequently updated data):
GET /flights?limit=50&after=flight_xyz Next page: GET /flights?limit=50&after=flight_abc
Cursor identifies position in result set (not numeric offset)
Advantages over offset:
Disadvantage: Cannot jump to arbitrary page
Page-based (simpler API):
GET /flights?page=1&per_page=50 and GET /flights?page=2&per_page=50
Server calculates offset internally: offset = (page - 1) × per_page
Pagination with filters:
GET /flights?departure_airport=LAX&limit=50&offset=0 → First 50 LAX flights GET /flights?departure_airport=LAX&limit=50&offset=50 → Next 50 LAX flights
Filters applied before pagination
Measured performance (2,500 flight collection):
Idempotent operation: Multiple identical requests have same effect as single request
GET - Idempotent and safe:
# Call once
response1 = requests.get('http://api/users/123')
user1 = response1.json() # {"user_id": 123, "email": "alice@..."}
# Call again
response2 = requests.get('http://api/users/123')
user2 = response2.json() # {"user_id": 123, "email": "alice@..."}
# Same result, no side effects
assert user1 == user2PUT - Idempotent but not safe:
# Call once
requests.put('http://api/users/123',
json={"email": "alice.new@example.com", "is_active": true})
# Result: email changed to alice.new@example.com
# Call again with same data
requests.put('http://api/users/123',
json={"email": "alice.new@example.com", "is_active": true})
# Result: email still alice.new@example.com (no additional change)
# Multiple calls → same final stateDELETE - Idempotent:
POST - Not idempotent:
# Call once
response1 = requests.post('http://api/users',
json={"email": "bob@example.com"})
# Response: 201 Created, user_id=456
# Call again with same data
response2 = requests.post('http://api/users',
json={"email": "bob@example.com"})
# Response: 201 Created, user_id=789 (different user!)
# Two users created - NOT idempotentIdempotency matters for retries:
Network timeout scenario:
try:
response = requests.post('http://api/bookings',
json={...},
timeout=5)
except requests.Timeout:
# Did booking succeed or fail? Unknown!
# Retry risks duplicate booking
passIdempotency key pattern:
# Client generates unique request ID
idempotency_key = str(uuid.uuid4())
response = requests.post('http://api/bookings',
json={...},
headers={'Idempotency-Key': idempotency_key})
# If timeout, retry with same key
# Server sees duplicate key, returns original response
# Safe to retry POST operationsServer implementation:
if idempotency_key in cache:
return cache[idempotency_key] # Return cached response
else:
result = create_booking(...)
cache[idempotency_key] = result
return resultIdempotency enables safe retry logic
REST constraint: Each request contains all information needed to process it
Stateful approach (violates REST):
# Login creates server-side session
POST /login
Body: {"email": "alice@example.com", "password": "..."}
Response:
HTTP/1.1 200 OK
Set-Cookie: session_id=abc123
# Server stores:
sessions['abc123'] = {
'user_id': 123,
'email': 'alice@example.com',
'logged_in_at': '2025-01-15T10:00:00Z'
}
# Subsequent requests reference session
GET /bookings
Cookie: session_id=abc123
# Server looks up session['abc123'] to get user_idProblems with server-side sessions:
Stateless approach (REST-compliant):
Stateless request:
# Every request includes complete authentication
GET /bookings
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
# Server decodes token to get user_id
# No session lookup neededJWT (JSON Web Token) structure:
Header:
{
"alg": "HS256",
"typ": "JWT"
}
Payload:
{
"user_id": 123,
"email": "alice@example.com",
"exp": 1705324800, # Expiration timestamp
"iat": 1705321200 # Issued at timestamp
}
Signature:
HMACSHA256(
base64(header) + "." + base64(payload),
server_secret_key
)
Final token:
base64(header).base64(payload).signatureBenefits of stateless design:
Token expiration:
Statelessness enables unlimited horizontal scaling
Python web frameworks for APIs:
EE 547 uses Flask
Minimal abstractions make core concepts visible. Patterns transfer to FastAPI and Django REST.
Framework-agnostic concepts covered:

Framework sits between HTTP server and handler code
Client
↓ HTTP Request
HTTP Server (gunicorn)
↓ WSGI
Flask Framework
↓ Calls
Handler Function
↓ Returns
Flask Framework
↓ WSGI
HTTP Server
↓ HTTP Response
Client
What Flask does:
Handler implementation:

Connecting a URL to a function
from flask import Flask
app = Flask(__name__)
@app.route('/health')
def health_check():
return {'status': 'healthy'}What happens:
@app.route('/health') registers the routeGET /health/health matches registered routehealth_check() functionResponse:
HTTP/1.1 200 OK
Content-Type: application/json
{"status": "healthy"}
Flask automatically:

Restricting which HTTP methods a route accepts
@app.route('/models', methods=['GET'])
def list_models():
return {'models': [...]}
@app.route('/models', methods=['POST'])
def create_model():
return {'id': 123}, 201Same URL, different methods:
GET /models → calls list_models()POST /models → calls create_model()PUT /models → 405 Method Not AllowedWhy separate by method:
Default is GET only:

Capturing values from the URL
URL: GET /models/42 Result: model_id = "42" (string)
Type conversion:
URL: GET /models/42 Result: model_id = 42 (integer)
URL: GET /models/abc Result: 404 Not Found (can’t convert to int)
Multiple parameters:
@app.route('/models/<int:model_id>/predictions/<pred_id>')
def get_prediction(model_id, pred_id):
return {'model': model_id, 'prediction': pred_id}URL: GET /models/42/predictions/xyz Result: model_id = 42, pred_id = "xyz"

Reading JSON from request body
from flask import request
@app.route('/predict', methods=['POST'])
def predict():
data = request.json
# data is dict: {'features': [1, 2, 3]}
features = data['features']
result = model.predict(features)
return {'prediction': float(result)}Client sends:
POST /predict
Content-Type: application/json
{"features": [1.2, 3.4, 5.6]}
Flask automatically:
request.jsonSafe access with get():

Reading parameters from URL query string
@app.route('/models')
def list_models():
# GET /models?limit=10&status=trained
limit = request.args.get('limit', 100, type=int)
# limit = 10 (converted to int)
status = request.args.get('status')
# status = "trained"
models = fetch_models(limit=limit, status=status)
return {'models': models}Query string after ? in URL:
key=value& separator/models?limit=10&status=trainedrequest.args.get() parameters:
type=int: convert to integerWithout default:
With default:

Reading HTTP headers
@app.route('/predict', methods=['POST'])
def predict():
# Authorization header
auth = request.headers.get('Authorization')
# "Bearer eyJhbGci..."
# Custom headers
request_id = request.headers.get('X-Request-ID')
# Content type
content_type = request.headers.get('Content-Type')
# Validate token
if not auth:
return {'error': 'Missing authorization'}, 401
if not validate_token(auth):
return {'error': 'Invalid token'}, 401
# Process request
return {'prediction': 0.87}Common headers:
Authorization: Auth tokensContent-Type: Body formatX-Request-ID: Request trackingUser-Agent: Client informationHeaders case-insensitive:

Return dict → Flask converts to JSON
@app.route('/predict', methods=['POST'])
def predict():
result = model.predict(request.json['features'])
return {'prediction': float(result)}Response Flask generates:
HTTP/1.1 200 OK
Content-Type: application/json
{"prediction": 0.87}
Flask automatically:
This is the most common pattern:

Return tuple: (data, status_code)
@app.route('/models', methods=['POST'])
def create_model():
model_id = save_model(request.json)
return {'id': model_id}, 201Response:
HTTP/1.1 201 Created
Content-Type: application/json
{"id": 42}
When to use different status codes:
201 Created - Resource successfully created (POST)
204 No Content - Success but no data to return (DELETE)
404 Not Found - Resource doesn’t exist
422 Unprocessable Entity - Validation failed

Return tuple: (data, status, headers)
@app.route('/models', methods=['POST'])
def create_model():
model_id = save_model(request.json)
return {'id': model_id}, 201, {
'Location': f'/models/{model_id}',
'X-Request-ID': request.headers.get('X-Request-ID')
}Response:
HTTP/1.1 201 Created
Content-Type: application/json
Location: /models/42
X-Request-ID: abc-123
{"id": 42}
Common response headers:
Location - URL of newly created resource
X-Request-ID - Echo back for tracking
Cache-Control - Control caching

Development server not for production
Problems:
Example:
@app.route('/predict')
def predict():
time.sleep(2) # Prediction takes 2 seconds
return {'result': 0.87}With flask run:
Production needs:

Single process means:
Gunicorn - Production WSGI server
What this does:
Worker calculation:
workers = (CPU cores × 2) + 1
2-core machine → 5 workers 4-core machine → 9 workers
Same 2-second prediction with 4 workers:
4× improvement for concurrent requests
Configuration file:

Multiple workers = concurrent processing
Each worker is independent process
Problem: Flask serves files synchronously - blocks workers
What happens:
Solution 1: Nginx serves static files
Nginx handles /static/* directly
Flask never sees these requests
Workers free for API calls
Solution 2: S3 redirect pattern
@app.route('/models/<model_id>/download')
def download_model(model_id):
# Generate temporary S3 URL (expires in 1 hour)
s3_url = generate_presigned_url(
bucket='models',
key=f'{model_id}.pkl',
expires_in=3600
)
return redirect(s3_url)Flow:
Use S3 redirect for: Large files (>10MB), model weights, datasets, user uploads

OpenAPI defines API structure in machine-readable format
Specification written in YAML or JSON, describes:
Example specification for user endpoint:
openapi: 3.0.0
info:
title: User Service API
version: 2.1.0
paths:
/users/{userId}:
get:
parameters:
- name: userId
in: path
required: true
schema:
type: integer
minimum: 1
responses:
'200':
description: User found
content:
application/json:
schema:
$ref: '#/components/schemas/User'
'404':
description: User not found
components:
schemas:
User:
type: object
required: [user_id, email, is_active]
properties:
user_id: {type: integer}
email: {type: string, format: email}
is_active: {type: boolean}
engagement_score: {type: number, minimum: 0, maximum: 100}Specification enforces contract between API provider and consumers
Specification serves multiple purposes:
1. Documentation source - Swagger UI generates interactive docs - Always synchronized with implementation - Developers explore API without writing code
2. Validation layer - Request validation against schema - Response validation before sending - Type checking and constraint enforcement
3. Code generation - Server stubs with routing - Client SDKs in multiple languages - Type-safe API calls
4. Contract testing - Verify implementation matches spec - Detect breaking changes - Test compliance automatically
Specification-first development:
Write spec → Generate code → Implement handlers
Ensures API design considered before implementation details
Alternative: Code-first
Write code → Generate spec from annotations
Easier to start, harder to maintain consistency
OpenAPI schemas define data structures with constraints
ML prediction endpoint schema:
paths:
/models/{modelId}/predict:
post:
parameters:
- name: modelId
in: path
schema: {type: string, pattern: '^[a-z0-9-]+$'}
requestBody:
required: true
content:
application/json:
schema:
type: object
required: [features, model_version]
properties:
features:
type: array
items: {type: number}
minItems: 10
maxItems: 10
model_version:
type: string
enum: [v1.0, v1.1, v2.0]
threshold:
type: number
minimum: 0.0
maximum: 1.0
default: 0.5
responses:
'200':
content:
application/json:
schema:
type: object
required: [prediction, confidence]
properties:
prediction: {type: number}
confidence: {type: number, minimum: 0, maximum: 1}Schema constraints validated automatically:
Invalid requests rejected before processing:
Missing required field:
POST /models/classifier-v2/predict
{"features": [1.2, 3.4, 5.6, 7.8, 9.0, 1.1, 2.2, 3.3, 4.4, 5.5]}
Response: 400 Bad Request
{
"error": "Validation failed",
"details": [{
"field": "model_version",
"message": "Required property missing"
}]
}Wrong array length:
POST /models/classifier-v2/predict
{"features": [1, 2, 3], "model_version": "v2.0"}
Response: 400 Bad Request
{
"details": [{
"field": "features",
"message": "Array must contain 10 items, found 3"
}]
}Invalid enum value:
{"features": [...], "model_version": "v3.0"}
Response: 400 Bad Request
{
"details": [{
"field": "model_version",
"message": "Value must be one of: v1.0, v1.1, v2.0"
}]
}Validation prevents:
Request rejected at API boundary: 1-2ms
Request failing during processing: 50-500ms wasted
Single OpenAPI specification generates multiple artifacts
1. Interactive documentation (Swagger UI)
Browsable interface with:
Developers test endpoints without writing client code
2. Server stubs
Generated code includes:
# Generated from OpenAPI spec
@app.route('/models/<model_id>/predict', methods=['POST'])
def predict_model(model_id: str):
# Request already validated against schema
body = request.json # Type: PredictionRequest
# Implement business logic here
result = run_prediction(model_id, body['features'])
# Response validated before sending
return {'prediction': result, 'confidence': 0.87}3. Client SDKs
Type-safe client libraries:
# Generated Python client
from api_client import UserServiceClient
client = UserServiceClient(base_url='https://api.example.com')
# Method signatures from spec
user = client.get_user(user_id=123) # Type: User
print(user.email) # IDE autocomplete knows fields
# Type checker catches errors
client.get_user(user_id="abc") # Error: expected int4. Request validation middleware
Automatically generated validators:
# Validates before handler executes
def validate_request(spec):
def decorator(f):
def wrapper(*args, **kwargs):
# Check request matches spec
errors = validate_against_schema(
request,
spec['paths'][request.path]
)
if errors:
return {'error': errors}, 400
return f(*args, **kwargs)
return wrapper
return decorator5. Mock servers
Generate mock API from specification:
Code generation tools:
Specification as single source of truth:
Change spec → Regenerate all artifacts
Documentation, validation, and client code stay synchronized
Manual maintenance alternative:
Machine-readable specification prevents divergence
APIs evolve but clients update slowly
Version placement options:
URL path versioning (most common):
GET /v1/users/123
GET /v2/users/123
Advantages: - Version immediately visible in URL - Easy to route in load balancer - Clear in logs and monitoring
Disadvantages: - URL changes with version - Resource “same” user has different URLs
Header versioning:
GET /users/123
Accept: application/vnd.api.v1+json
GET /users/123
Accept: application/vnd.api.v2+json
Advantages: - URLs remain stable - Content negotiation pattern
Disadvantages: - Version not visible in URL - Harder to test in browser - Requires header inspection
Custom header:
GET /users/123
API-Version: 1
GET /users/123
API-Version: 2
Similar trade-offs to Accept header
Query parameter (not recommended):
GET /users/123?version=1
GET /users/123?version=2
Disadvantages: - Mixes version with filtering parameters - Caching issues (query params affect cache key)
Version granularity:
Major versions (breaking changes): - v1 → v2: Field removed or renamed - v2 → v3: Response structure changed - Requires separate implementation
Minor versions (additions): - v2.0 → v2.1: New optional field added - v2.1 → v2.2: New endpoint added - Backward compatible within major version
Semantic versioning pattern:
MAJOR.MINOR.PATCH
When to increment major version:
Backward compatible additions:
Parallel version support:
Both versions active simultaneously:
@app.route('/v1/users/<id>')
def get_user_v1(id):
user = fetch_user(id)
return {'user': id, 'active': user.is_active}
@app.route('/v2/users/<id>')
def get_user_v2(id):
user = fetch_user(id)
return {
'user_id': id, # Renamed
'is_active': user.is_active,
'created_at': user.created_at # New field
}Maintains compatibility while evolving API
Breaking change: Modification that causes existing clients to fail
Common breaking changes:
Field removal:
// v1 response
{"user_id": 123, "email": "alice@example.com", "phone": "+1-555-0100"}
// v2 response
{"user_id": 123, "email": "alice@example.com"}
// phone field removedClient code accessing response['phone'] raises KeyError
Field rename:
Client parsing created field receives KeyError
Type change:
Client expecting string, performs string operations on number → TypeError
New required field:
// v1 request
POST /bookings
{"flight_id": 456, "user_id": 123}
// v2 request (requires seat_class)
POST /bookings
{"flight_id": 456, "user_id": 123, "seat_class": "economy"}
Old clients missing seat_class → 400 Bad Request
Status code change:
// v1: Returns 200 OK when user not found (empty result)
// v2: Returns 404 Not Found when user not found
Client checking status == 200 for success misses 404 case
Non-breaking changes (backward compatible):
Adding optional field to response:
// v1 response
{"user_id": 123, "email": "alice@example.com"}
// v2 response
{"user_id": 123, "email": "alice@example.com",
"created_at": "2024-01-15"}Old clients ignore unknown created_at field
Adding optional request parameter:
// v1: GET /flights?departure=LAX
// v2: GET /flights?departure=LAX&max_price=500
Old clients don’t send max_price, server uses default behavior
Adding new endpoint:
// v1: GET /users, POST /users
// v2: GET /users, POST /users, GET /users/search
Old clients unaware of /users/search, continue using existing endpoints
Adding new HTTP method to existing endpoint:
// v1: GET /users/123
// v2: GET /users/123, PATCH /users/123
Old clients only use GET, PATCH addition doesn’t affect them
Deprecation headers indicate future removal:
HTTP/1.1 200 OK
Deprecation: true
Sunset: Wed, 31 Dec 2025 23:59:59 GMT
Link: </v2/users/123>; rel="successor-version"
Clients warned field or endpoint will be removed
Contract testing prevents breaking changes:
def test_v1_user_response_format():
"""Verify v1 response format unchanged"""
response = api_client.get_user_v1(123)
assert 'user_id' in response
assert 'email' in response
assert isinstance(response['user_id'], int)
assert isinstance(response['email'], str)Test fails if response structure changes, preventing accidental breaking changes
Migrating clients from v1 to v2 takes months
Typical timeline:
Week 0: v2 deployed, v1 maintained
Both versions handle requests: - v1: Existing clients continue working - v2: New clients adopt new features - Server runs both implementations
Week 4: Monitor adoption
SELECT version, COUNT(*) as requests
FROM api_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY version;
-- v1: 234,567 requests (65%)
-- v2: 126,433 requests (35%)Week 8: Begin deprecation warnings
Add headers to v1 responses:
HTTP/1.1 200 OK
Deprecation: true
Sunset: Mon, 15 Sep 2025 23:59:59 GMT
Link: </docs/v2-migration>; rel="deprecation-policy"Week 12: Active migration outreach
Contact clients still on v1: - Email with migration guide - Breaking change documentation - Code examples for common patterns - Offer support for migration issues
Week 16: Check adoption progress
Still 25% on v1, cannot remove yet
Week 20: Gradual enforcement
Make v1 read-only: - GET requests: Continue working - POST/PUT/DELETE: Return 410 Gone with migration instructions
Week 24: Final adoption check
Identify remaining v1 clients:
SELECT client_id, COUNT(*) as requests
FROM api_logs
WHERE version = 'v1'
AND timestamp > NOW() - INTERVAL '7 days'
GROUP BY client_id
ORDER BY requests DESC;
-- batch-job-1: 8,234 requests (automated, no owner)
-- mobile-app: 2,109 requests (old app version)
-- partner-api: 1,876 requests (quarterly release cycle)
-- unknown: 234 requests (API key lost)Week 26-28: Final client migration
Contact remaining clients directly
Week 30: v1 shutdown
Return 410 Gone for all v1 requests:
HTTP/1.1 410 Gone
{
"error": "API v1 has been retired",
"shutdown_date": "2025-09-15",
"migration_guide": "/docs/v1-to-v2",
"support": "api-support@example.com"
}Cost of parallel versions:
Estimated 1.8× development cost during overlap period
Structured errors provide actionable information
Basic error response:
Detailed validation errors:
{
"error": {
"code": "VALIDATION_ERROR",
"message": "Request validation failed",
"details": [
{
"field": "features[3]",
"value": "NaN",
"constraint": "type",
"message": "Must be a number"
},
{
"field": "threshold",
"value": 1.5,
"constraint": "maximum",
"message": "Must be at most 1.0"
}
]
}
}Rate limit error with retry information:
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "API rate limit exceeded",
"limit": 1000,
"remaining": 0,
"reset_at": "2025-01-15T15:00:00Z",
"retry_after": 600
}
}Resource not found with suggestions:
Error response components:
1. Machine-readable code
Enables programmatic handling:
if response.status_code == 400:
error = response.json()['error']
if error['code'] == 'VALIDATION_ERROR':
# Fix validation issues
for detail in error['details']:
log.warning(f"Field {detail['field']}: {detail['message']}")
elif error['code'] == 'RATE_LIMIT_EXCEEDED':
# Wait and retry
time.sleep(error['retry_after'])2. Human-readable message
For developer debugging and logs
3. Context-specific details
Field-level errors for validation failures
4. Actionable information
Rate limits include reset time and retry delay
5. Request correlation
Include in support tickets for log correlation
6. Documentation links
Error code categories:
VALIDATION_ERROR: Client sent invalid dataAUTHENTICATION_ERROR: Token missing or invalidAUTHORIZATION_ERROR: Valid token, insufficient permissionsRATE_LIMIT_EXCEEDED: Too many requestsRESOURCE_NOT_FOUND: Requested resource doesn’t existCONFLICT: Operation conflicts with current stateSERVER_ERROR: Internal server failureConsistent error structure across all endpoints
Large collections require pagination
Collection with 2,500 users:
Without pagination: GET /users - Returns all 2,500 users - Response size: 3.8 MB - Load time: 6-8 seconds - Client memory: Entire collection
With pagination: GET /users?limit=50&offset=0 - Returns 50 users - Response size: 76 KB (50× smaller) - Load time: 120ms (50× faster) - Client memory: Current page only
Offset-based pagination parameters:
limit: Number of items per page (page size) offset: Number of items to skip (starting position)
Fetching pages:
Page 1 (users 1-50):
GET /users?limit=50&offset=0
Page 2 (users 51-100):
GET /users?limit=50&offset=50
Page 3 (users 101-150):
GET /users?limit=50&offset=100
Formula: offset = (page_number - 1) × limit
Pagination metadata in response:
{
"users": [
{"user_id": 1, "email": "alice@example.com", ...},
{"user_id": 2, "email": "bob@example.com", ...},
...
],
"pagination": {
"limit": 50,
"offset": 0,
"total": 2500,
"has_more": true,
"next": "/users?limit=50&offset=50",
"previous": null
}
}Response includes links to next/previous pages
Offset pagination with filters:
GET /users?status=active&limit=50&offset=0
Filter applied before pagination: 1. Query users where status=‘active’ (1,200 matching) 2. Skip first 0 users 3. Return next 50 users
{
"users": [...50 active users...],
"pagination": {
"limit": 50,
"offset": 0,
"total": 1200, // Total matching filter
"next": "/users?status=active&limit=50&offset=50"
}
}Offset pagination advantages:
Offset pagination limitations:
1. Performance degrades with large offsets
Database query: SELECT * FROM users LIMIT 50 OFFSET 10000
Must scan 10,000 rows before returning 50
2. Inconsistent results during modifications
Client requests page 1 (users 1-50) User 25 gets deleted Client requests page 2 (offset=50)
Receives users 51-100 (previously users 52-101) User 51 never seen by client
3. Duplicate results with insertions
Client requests page 1 (users 1-50) New user inserted at position 10 Client requests page 2 (offset=50)
Receives users 51-100 (previously users 50-99) User 50 appears on both pages
Cursor-based pagination solves these issues
Cursor encodes position in result set
Instead of numeric offset, use opaque cursor token
Initial request:
GET /users?limit=50
Response with cursor:
{
"users": [
{"user_id": 1, ...},
{"user_id": 2, ...},
...
{"user_id": 50, ...}
],
"pagination": {
"limit": 50,
"next_cursor": "eyJ1c2VyX2lkIjo1MH0=",
"has_more": true
}
}Next page request:
GET /users?limit=50&cursor=eyJ1c2VyX2lkIjo1MH0=
Cursor is base64-encoded JSON: {"user_id": 50}
Database query using cursor:
-- Without cursor (first page)
SELECT * FROM users ORDER BY user_id LIMIT 50;
-- With cursor (subsequent pages)
SELECT * FROM users
WHERE user_id > 50
ORDER BY user_id
LIMIT 50;No OFFSET clause - uses indexed WHERE condition
Cursor for different sort orders:
Sort by created_at descending:
Decoded: {"created_at": "2025-01-15T10:30:00Z", "user_id": 50}
Include user_id for tie-breaking when timestamps equal
Cursor pagination advantages:
1. Consistent performance
Direct index lookup, no scanning:
2. Stable results during modifications
Client requests page 1 with cursor User 25 gets deleted Client requests page 2 using cursor
Cursor points to user_id > 50, deletion of user 25 doesn’t affect next page
3. No duplicate results from insertions
Cursor maintains position relative to sorted order, new insertions don’t cause duplicates
Cursor pagination limitations:
Cannot jump to arbitrary page
No “go to page 50” - must traverse sequentially
Cannot display total page count
Computing total requires full count query (expensive)
Cursor must be opaque to client
// Bad: Exposing internal structure
GET /users?after_id=50
// Good: Opaque cursor
GET /users?cursor=eyJ1c2VyX2lkIjo1MH0=
Allows server to change cursor format without breaking clients
When to use each approach:
Offset pagination: - Need page numbers (UI with page selector) - Need total count - Data rarely changes - Small to medium collections
Cursor pagination: - Large collections (millions of rows) - Data frequently updated - Mobile apps (efficient, consistent) - Infinite scroll UX
Many APIs support both: limit/offset for random access, limit/cursor for efficient traversal
June 2012: 6.5 million LinkedIn password hashes stolen
What LinkedIn did:
-- Stored password hashes (not plaintext) ✓
SELECT user_id, email, password_hash FROM users;
-- But used SHA-1 without salt ✗
password_hash = SHA1(password)What attackers did:
common_hashes = {
SHA1("password123"): "password123",
SHA1("123456"): "123456",
# ... 10 million entries
}Result: 90% of passwords cracked within 72 hours
Why it failed:
Same password “123456” compromised 753,000 accounts simultaneously

LinkedIn’s failure shows why hashing alone isn’t enough.
LinkedIn’s breach shows authentication failures cascade across systems
ML API requires authentication to prevent unauthorized access:
Every API request needs to answer two questions:
In a single process, identity is implicit:
def delete_file(filepath):
# Running as OS user 'alice'
# OS checks if alice can delete filepath
os.remove(filepath)In distributed systems, identity must be explicit:
def handle_delete_request(request):
# Who sent this HTTP request?
user = authenticate(request) # Extract identity
# Can they delete this file?
if not authorize(user, 'delete', filepath):
return 403
delete_file(filepath)HTTP is stateless - no memory between requests:
Three approaches to maintaining identity across requests:
Each approach makes different trade-offs between security, scalability, and complexity.

Identity must be explicitly established in every HTTP request.
Authentication transforms a secret into verified identity
Step 1: User provides credentials
Step 2: Server verifies against stored credentials
def authenticate(email, password):
user = db.query("SELECT * FROM users WHERE email = ?", email)
if verify_password(password, user.password_hash):
return user.id # Identity established
return NoneStep 3: Server issues proof of authentication
# Option A: Server-side session
session_id = generate_random_id()
sessions[session_id] = user_id
return {"session_id": session_id}
# Option B: Cryptographic token
token = jwt.encode({"user_id": user_id, "exp": time() + 3600})
return {"token": token}Password storage determines breach impact:
Never store plaintext passwords:
-- CATASTROPHIC: Database breach exposes all passwords
SELECT * FROM users WHERE password = 'secret123'Store cryptographic hashes instead:

Authentication converts credentials to identity proof.
LinkedIn used SHA-1 hashing - why wasn’t that enough?
First, understand why plaintext is catastrophic:
Database breach with plaintext passwords:
SELECT email, password FROM users LIMIT 3;
-- alice@example.com | secret123
-- bob@example.com | secret123
-- carol@example.com | password1All accounts immediately compromised.
Hash functions provide one-way transformation:
Cannot reverse: hash → original password (computationally infeasible)
Asymmetry favors attackers:
Legitimate use: Verify one password for one user
Attack: Try millions of passwords against all users
# Attacker with stolen hash database
common_passwords = ["password", "123456", "secret123", ...]
for password in common_passwords: # 10 million
test_hash = hash(password)
for stored_hash in database: # 100,000 users
if test_hash == stored_hash:
compromised.append(...)Solution: Make hashing deliberately slow
This is why LinkedIn’s passwords fell in 72 hours - SHA-1 allowed rapid dictionary attacks.
This asymmetry favors defenders over attackers.

Time cost makes brute force attacks impractical.
LinkedIn’s second mistake: No salt
Even with slow hashing, common passwords create identical hashes:
Without salt, all users with “password123” have same hash:
hash("password123") → "ef92b778bafe771e89245b89ecb..."
# Database search finds 1,847 users with this hash
# All compromised with single hash computationSalt: Random value unique to each user
def create_user(email, password):
salt = generate_random_bytes(16) # Unique per user
password_hash = hash(salt + password)
db.store(email, salt, password_hash)Now identical passwords produce different hashes:
# User 1
salt1 = "a1b2c3d4..."
hash("a1b2c3d4..." + "password123") → "7f3c6b2a..."
# User 2
salt2 = "e5f6g7h8..."
hash("e5f6g7h8..." + "password123") → "92a8b7c6..."
# Different hashes despite same passwordImpact on attack strategy:
Without salt: One computation compromises all instances
target_hash = "ef92b778..."
if computed_hash == target_hash:
# Found password for ALL users with this hashWith salt: Must attack each user individually
for user in users:
for password in dictionary:
if hash(user.salt + password) == user.hash:
# Found password for ONE user onlySalt is not secret - stored with hash, prevents mass attacks not targeted ones
With salt, LinkedIn’s 753,000 “123456” users would each need individual attacks

Salt forces individual attacks per user.
Combining defenses: Slow hashing + Salt + Adaptive work factor
bcrypt’s configurable work factor scales with hardware improvements:
# Work factor determines iteration count: 2^factor
bcrypt.gensalt(10) # 2^10 = 1,024 iterations (2010)
bcrypt.gensalt(12) # 2^12 = 4,096 iterations (2020)
bcrypt.gensalt(14) # 2^14 = 16,384 iterations (2030)Each increment doubles computation time:
| Factor | Iterations | Time/Hash | Passwords/Day |
|---|---|---|---|
| 10 | 1,024 | 50ms | 1.7M |
| 11 | 2,048 | 100ms | 864K |
| 12 | 4,096 | 200ms | 432K |
| 13 | 8,192 | 400ms | 216K |
| 14 | 16,384 | 800ms | 108K |
Balancing security and usability:
def choose_work_factor():
# Target: 250ms computation time
test_password = b"benchmark"
for factor in range(10, 15):
start = time.time()
bcrypt.hashpw(test_password, bcrypt.gensalt(factor))
duration = time.time() - start
if duration > 0.250: # 250ms target
return factor
return 14 # Maximum reasonable factorMoore’s Law compensation:
Security parameter improves over time without code changes

Work factor increases maintain security despite hardware improvements.
Server sessions: Centralized state
# Login creates session in shared store
session_id = generate_uuid()
redis.set(f"session:{session_id}", {
"user_id": 123,
"created": timestamp,
"permissions": ["read", "write"]
})
response.set_cookie("session_id", session_id)
# Every request requires lookup
def handle_request(request):
session_id = request.cookies.get("session_id")
session = redis.get(f"session:{session_id}") # Network call
if not session:
return 401Tokens: Distributed state
# Login creates self-contained token
payload = {
"user_id": 123,
"exp": timestamp + 3600,
"permissions": ["read", "write"]
}
token = jwt.encode(payload, SECRET_KEY)
return {"token": token}
# Every request validates locally
def handle_request(request):
token = request.headers["Authorization"].split(" ")[1]
payload = jwt.decode(token, SECRET_KEY) # CPU only
# No network call requiredTrade-offs in practice:
| Aspect | Sessions | Tokens |
|---|---|---|
| Revocation | Immediate | At expiration |
| Scaling | Requires shared store | Linear |
| Network calls | Every request | None |
| State size | Server: O(users) | Server: O(1) |
| Client complexity | Simple cookie | Header management |

Sessions require coordination; tokens are independent.
Authentication establishes identity; authorization determines capabilities
def process_request(request):
# Step 1: Who are you? (Authentication)
user_id = validate_token(request.headers['Authorization'])
if not user_id:
return 401 # Unauthorized - don't know who you are
# Step 2: What can you do? (Authorization)
resource = request.path # e.g., /models/123
action = request.method # e.g., DELETE
if not has_permission(user_id, resource, action):
return 403 # Forbidden - know who you are, can't do this
# Step 3: Execute
return perform_action(resource, action)Three authorization models:
1. Role-Based (RBAC): Users have roles, roles have permissions
user.roles = ["developer", "viewer"]
role_permissions = {
"developer": ["read", "write", "deploy"],
"viewer": ["read"],
"admin": ["read", "write", "deploy", "delete"]
}
# Can user deploy? Check if any role has permission2. Attribute-Based (ABAC): Decisions based on attributes
can_access = (
user.department == resource.department and
user.clearance_level >= resource.sensitivity and
current_time in user.work_hours
)3. Resource-Based: Users own resources

Authorization determines what authenticated users can do.
Tokens can’t be recalled after issuing:
Once issued, JWT remains valid until expiration:
token_payload = {
"user_id": 123,
"exp": time() + 3600, # Valid for 1 hour
"scopes": ["read:data", "write:data", "delete:data"]
}Employee terminated at 2:00 PM:
Three approaches to bounded revocation:
1. Short-lived access tokens (15 minutes)
access_token = create_token(expires_in=15*60)
refresh_token = create_token(expires_in=30*24*60*60)
# After termination, refresh fails
def refresh():
if user_terminated(refresh_token.user_id):
return 401 # No new access token
return create_access_token()2. Blacklist critical tokens
# Maintain revoked token list (small subset)
revoked_tokens = redis.set() # Only for terminated users
def validate_token(token):
if token.jti in revoked_tokens: # Quick check
return None
return decode_token(token)3. Version-based invalidation
# User has token_version in database
user.token_version = 2 # Increment on revocation
# Token includes version
token.version = 1
# Validation checks version
if token.version < user.token_version:
return 401 # Token outdatedTrade-off: Security (short expiry) vs Performance (fewer refreshes)

Shorter tokens increase security but require more refreshes.
Session-based scaling requires coordination
Adding servers with sessions:
# Server 1 has session for User A
# Server 2 has session for User B
# Load balancer must remember routing (sticky sessions)
# OR share sessions via Redis (single point of failure)Measured impact with 1000 requests/second:
Token-based scaling is trivial
Adding servers with tokens:
Measured impact with 1000 requests/second:
Deployment advantages:
| Operation | Sessions | Tokens |
|---|---|---|
| Add server | Update session store | Add server |
| Remove server | Migrate sessions | Remove server |
| Deploy update | Coordinate session drain | Rolling update |
| Region failover | Replicate sessions | No change |
Cost at scale (10K concurrent users):
Stateless architecture enables linear scaling without coordination overhead.

Tokens enable horizontal scaling without coordination.
JSON Web Tokens encode identity without server state
JWT structure: Three Base64-encoded parts separated by dots
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.
eyJ1c2VyX2lkIjoxMjMsImVtYWlsIjoiYWxpY2VAZXhhbXBsZS5jb20iLCJleHAiOjE3MDUzMjQ4MDB9.
SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
Part 1: Header (Algorithm and type)
Part 2: Payload (Claims about user)
{
"user_id": 123,
"email": "alice@example.com",
"exp": 1705324800, // Expires: Unix timestamp
"iat": 1705321200, // Issued at
"scopes": ["read", "write"]
}Part 3: Signature (Prevents tampering)
HMACSHA256(
base64(header) + "." + base64(payload),
server_secret_key
)
Critical properties:

JWT structure enables stateless authentication.
Signature prevents token forgery
Server creates token with secret:
import jwt
secret_key = "server-secret-abc123" # Only server knows
payload = {
"user_id": 123,
"email": "alice@example.com",
"exp": time.time() + 3600
}
token = jwt.encode(payload, secret_key, algorithm="HS256")
# Result: eyJhbGciOiJIUzI1NiIs...Client cannot modify token:
# Attacker tries to change user_id
decoded = base64.decode(token.split('.')[1])
decoded['user_id'] = 999 # Change to admin
fake_payload = base64.encode(decoded)
# But cannot generate valid signature without secret
fake_token = header + "." + fake_payload + "." + random_signature
# Server will reject: Invalid signatureServer validates with same secret:
def validate_token(token):
try:
payload = jwt.decode(token, secret_key, algorithms=["HS256"])
# Signature valid, token not expired
return payload
except jwt.InvalidSignatureError:
return None # Tampered token
except jwt.ExpiredSignatureError:
return None # Token too oldSymmetric (HS256) vs Asymmetric (RS256):

Only server with secret can create valid tokens.
Standard claims provide common functionality
Registered claims (predefined meanings):
{
"iss": "https://auth.company.com", // Issuer
"sub": "user:123", // Subject (user)
"aud": "https://api.company.com", // Audience (recipient)
"exp": 1705324800, // Expiration time
"nbf": 1705321200, // Not before
"iat": 1705321200, // Issued at
"jti": "a1b2c3d4" // JWT ID (unique)
}Time-based validation:
current_time = 1705323000 # Unix timestamp
# Token not yet valid (nbf = not before)
if current_time < token['nbf']:
return "Token not yet valid"
# Token expired
if current_time > token['exp']:
return "Token expired"
# Valid time window: nbf <= current_time <= expCustom claims for application data:
{
// Standard claims
"exp": 1705324800,
"iat": 1705321200,
// Custom application claims
"user_id": 123,
"email": "alice@example.com",
"roles": ["developer", "reviewer"],
"department": "engineering",
"permissions": {
"models": ["read", "write"],
"data": ["read"]
}
}Token size considerations:

Claims carry both metadata and application data.
Short access tokens + long refresh tokens minimize risk
Dual token pattern:
def login(email, password):
if authenticate(email, password):
# Short-lived for API calls
access_token = create_jwt(
user_id=123,
expires_in=15*60 # 15 minutes
)
# Long-lived for obtaining new access tokens
refresh_token = create_jwt(
user_id=123,
token_type="refresh",
expires_in=30*24*60*60 # 30 days
)
# Store refresh token for revocation
db.store_refresh_token(refresh_token)
return {
"access_token": access_token,
"refresh_token": refresh_token,
"expires_in": 900
}Token refresh flow:
def refresh_access_token(refresh_token):
# Validate refresh token
payload = jwt.decode(refresh_token, secret_key)
# Check if revoked (requires DB check)
if is_revoked(refresh_token):
return 401 # Revoked
# Issue new access token
new_access = create_jwt(
user_id=payload['user_id'],
expires_in=15*60
)
return {"access_token": new_access}Security boundaries:

Refresh tokens enable short access tokens without constant re-authentication.
OAuth allows third-party access without sharing passwords
OAuth solves password sharing with third parties:
# WITHOUT OAuth (dangerous):
# GitHub analyzer needs your Google Drive files
analyzer.login(
google_email="alice@gmail.com",
google_password="secret123" # Giving password to third party!
)OAuth authorization flow:
Step 1: User authorizes at provider
Browser → https://accounts.google.com/oauth/authorize?
client_id=github-analyzer&
redirect_uri=https://analyzer.com/callback&
scope=drive.readonly&
response_type=code
Step 2: Provider redirects with authorization code
Browser ← https://analyzer.com/callback?code=abc123
Step 3: Exchange code for token (backend)
# Server-to-server, not visible to browser
response = requests.post('https://oauth2.googleapis.com/token', {
'code': 'abc123',
'client_id': 'github-analyzer',
'client_secret': 'secret-key-xyz', # Proves identity
'grant_type': 'authorization_code'
})
tokens = response.json()
# {
# "access_token": "ya29.a0ARrdaM...",
# "token_type": "Bearer",
# "expires_in": 3600,
# "scope": "drive.readonly"
# }Key principles:

OAuth enables access without sharing credentials.
Scopes limit what applications can access
Requesting specific permissions:
# Application requests only what it needs
auth_url = "https://github.com/login/oauth/authorize?" + urlencode({
"client_id": "ml-trainer-app",
"scope": "repo:read user:email", # Specific permissions
"redirect_uri": "https://mlapp.com/callback"
})User sees requested permissions:
ML Trainer App wants to access your GitHub account:
✓ Read access to repositories
- View code, issues, pull requests
- View repository metadata
✓ Read user email addresses
- View primary email
- View verified status
✗ Will NOT be able to:
- Write to repositories
- Delete anything
- Access billing information
[Authorize] [Deny]
Token contains granted scopes:
{
"access_token": "gho_16C7e42F292c6912E7710c838347Ae178B4a",
"token_type": "bearer",
"scope": "repo:read user:email", // What was actually granted
"expires_in": 28800
}Common scope patterns:
| Provider | Scope | Permission |
|---|---|---|
| GitHub | repo |
Full repository access |
| GitHub | repo:status |
Only commit status |
drive.readonly |
Read files only | |
drive.file |
Only files created by app | |
| Slack | chat:write |
Post messages |
| Slack | users:read |
View user information |
Principle of least privilege: Request minimum necessary scope

Scopes provide granular access control.
OAuth defines multiple flows for different scenarios
1. Authorization Code (web apps with backend)
# Most secure: Code exchanged server-to-server
# Frontend never sees client_secret
flow = "user → provider → code → backend → token"2. Client Credentials (service-to-service)
# No user involved, service authenticates directly
response = requests.post('https://oauth2.provider.com/token', {
'grant_type': 'client_credentials',
'client_id': 'batch-processor',
'client_secret': 'secret-xyz',
'scope': 'data.process'
})
# Used for: Cron jobs, backend services, APIs calling APIs3. Implicit Flow (deprecated, was for SPAs)
// Token returned directly in URL fragment
// Insecure: Token visible in browser history
// Replaced by: Authorization Code + PKCE4. Password Grant (deprecated, legacy systems)
# User gives password to application directly
# Defeats purpose of OAuth
# Only use: Migrating legacy systemsModern standard: Authorization Code + PKCE
# PKCE (Proof Key for Code Exchange) adds security
code_verifier = generate_random_string(128)
code_challenge = sha256(code_verifier)
# Include challenge in authorization request
# Include verifier in token exchange
# Prevents code interception attacksGrant type selection:

Different flows optimize for security and usability.
Where and how to store tokens determines security
Browser storage options:
// localStorage - Persistent but vulnerable to XSS
localStorage.setItem('token', jwt_token);
// ⚠️ Any JavaScript can read: <script>alert(localStorage.token)</script>
// sessionStorage - Per-tab, still XSS vulnerable
sessionStorage.setItem('token', jwt_token);
// httpOnly cookie - Not accessible to JavaScript
// ✓ XSS protected, ✗ CSRF vulnerable
Set-Cookie: token=jwt_token; HttpOnly; Secure; SameSite=Strict
// Memory only - Most secure but lost on refresh
const token = jwt_token; // JavaScript variableMobile app storage:
# iOS Keychain / Android Keystore (encrypted)
keychain.set('access_token', token, accessible=WHEN_UNLOCKED)
# SharedPreferences / UserDefaults (not encrypted)
# ⚠️ Accessible if device rooted/jailbrokenToken transmission:
# Always use Authorization header
headers = {'Authorization': f'Bearer {token}'}
# Never in URL parameters (logged, cached, shared)
# ✗ GET /api/data?token=jwt_token # Appears in logs!
# Never in request body for GET (non-standard)
# ✗ GET /api/data {"token": "jwt_token"}Security checklist:

Choose storage based on threat model.
Evolution of authorization complexity
Level 1: Binary access (all or nothing)
if authenticated:
return FULL_ACCESS
else:
return NO_ACCESS
# Problem: Every authenticated user can do everythingLevel 2: Resource ownership
def can_access(user_id, resource):
if resource.owner_id == user_id:
return True
return False
# Problem: No sharing, no admin accessLevel 3: Role-based (RBAC)
user_roles = ["developer"]
role_permissions = {
"viewer": ["read"],
"developer": ["read", "write"],
"admin": ["read", "write", "delete"]
}
# Problem: Roles are coarse-grainedLevel 4: Attribute-based (ABAC)
def can_access(user, resource, action, context):
return (
user.department == resource.department and
action in user.permissions and
resource.sensitivity <= user.clearance_level and
context.time in user.work_hours and
context.location in user.allowed_locations
)
# Fine-grained but complexReal systems use hybrid approaches:

More complex models provide finer control.
Users control resources they create
Database schema enforces ownership:
CREATE TABLE models (
id INTEGER PRIMARY KEY,
owner_id INTEGER NOT NULL,
name VARCHAR(255),
created_at TIMESTAMP,
is_public BOOLEAN DEFAULT FALSE,
FOREIGN KEY (owner_id) REFERENCES users(id)
);
CREATE TABLE model_shares (
model_id INTEGER,
user_id INTEGER,
permission VARCHAR(20), -- 'read', 'write'
PRIMARY KEY (model_id, user_id)
);Authorization logic:
def get_permission(user_id, model_id):
model = db.query("SELECT * FROM models WHERE id = ?", model_id)
# Owner has full control
if model.owner_id == user_id:
return ["read", "write", "delete", "share"]
# Check explicit shares
share = db.query("""
SELECT permission FROM model_shares
WHERE model_id = ? AND user_id = ?
""", model_id, user_id)
if share:
return share.permission.split(",")
# Public resources allow read
if model.is_public:
return ["read"]
return [] # No accessCommon patterns:

Ownership provides natural access boundaries.
Users have roles, roles have permissions
Three-level hierarchy:
# 1. Users are assigned roles
user_roles = {
123: ["developer", "reviewer"],
456: ["viewer"],
789: ["admin", "developer"]
}
# 2. Roles define permissions
role_permissions = {
"viewer": {
"models": ["read"],
"data": ["read"]
},
"developer": {
"models": ["read", "write", "execute"],
"data": ["read", "write"],
"compute": ["submit"]
},
"reviewer": {
"models": ["read", "approve"],
"audit": ["read"]
},
"admin": {
"models": ["read", "write", "delete"],
"data": ["read", "write", "delete"],
"compute": ["submit", "cancel"],
"users": ["read", "write"]
}
}
# 3. Check if any role grants permission
def has_permission(user_id, resource_type, action):
user_role_list = user_roles.get(user_id, [])
for role in user_role_list:
permissions = role_permissions.get(role, {})
allowed_actions = permissions.get(resource_type, [])
if action in allowed_actions:
return True
return FalseRBAC advantages:
RBAC limitations:

RBAC separates users from permissions via roles.
Access decisions based on multiple attributes
Attributes from multiple sources:
# User attributes
user = {
"id": 123,
"department": "ml_research",
"clearance_level": 3,
"location": "us-west",
"employee_type": "full_time",
"projects": ["alpha", "beta"]
}
# Resource attributes
resource = {
"id": 456,
"type": "model",
"classification": "confidential",
"department": "ml_research",
"project": "alpha",
"created_date": "2024-01-15",
"tags": ["experimental", "gpu_required"]
}
# Environment attributes
context = {
"time": "14:30",
"day": "weekday",
"ip_address": "10.0.1.5",
"network": "corporate",
"request_type": "api"
}
# Action being requested
action = "write"Policy evaluation:
def evaluate_access(user, resource, action, context):
# Rule 1: Department match required
if user["department"] != resource["department"]:
return False
# Rule 2: Clearance level check
clearance_required = {
"public": 1,
"internal": 2,
"confidential": 3,
"secret": 4
}
if user["clearance_level"] < clearance_required[resource["classification"]]:
return False
# Rule 3: Time-based access
if resource["classification"] == "secret":
hour = int(context["time"].split(":")[0])
if not (9 <= hour <= 17) or context["day"] == "weekend":
return False
# Rule 4: Project membership for write
if action == "write":
if resource["project"] not in user["projects"]:
return False
return True
ABAC evaluates multiple attributes for decisions.
Permissions flow down resource hierarchies
Resource hierarchy:
Organization
├── Projects
│ ├── Models
│ │ ├── Versions
│ │ └── Deployments
│ └── Datasets
│ ├── Training
│ └── Validation
└── Teams
└── Members
Permission inheritance:
class ResourceHierarchy:
def __init__(self):
self.permissions = {} # resource_id -> {user_id: permissions}
self.parents = {} # resource_id -> parent_id
def get_effective_permissions(self, user_id, resource_id):
"""Get permissions including inherited from parents"""
permissions = set()
# Walk up the hierarchy
current = resource_id
while current:
# Get direct permissions at this level
if current in self.permissions:
user_perms = self.permissions[current].get(user_id, [])
permissions.update(user_perms)
# Move to parent
current = self.parents.get(current)
return list(permissions)
def check_permission(self, user_id, resource_id, action):
perms = self.get_effective_permissions(user_id, resource_id)
# Check for explicit deny (overrides allow)
if f"deny:{action}" in perms:
return False
# Check for allow
return action in perms or "*" in permsExample scenario:

Permissions cascade down hierarchy unless overridden.
Allowing users to act on behalf of others
Delegation: Temporary permission transfer
class DelegationManager:
def create_delegation(self, from_user, to_user, resource,
permissions, expires_at):
"""User explicitly grants subset of their permissions"""
# Verify delegator has permissions to delegate
delegator_perms = get_permissions(from_user, resource)
if not all(p in delegator_perms for p in permissions):
raise Error("Cannot delegate permissions you don't have")
delegation = {
"id": generate_id(),
"from_user": from_user,
"to_user": to_user,
"resource": resource,
"permissions": permissions,
"expires_at": expires_at,
"created_at": now()
}
db.store_delegation(delegation)
audit_log("DELEGATION_CREATED", delegation)
def check_permission(self, user_id, resource, action):
# Check direct permissions
if has_direct_permission(user_id, resource, action):
return True
# Check delegated permissions
delegations = db.query("""
SELECT * FROM delegations
WHERE to_user = ? AND resource = ?
AND expires_at > NOW()
""", user_id, resource)
for delegation in delegations:
if action in delegation.permissions:
audit_log("DELEGATED_ACCESS", {
"user": user_id,
"delegator": delegation.from_user,
"action": action
})
return True
return FalseService impersonation:
# Service account acts as user for background tasks
def run_as_user(user_id, task):
"""Service executes task with user's permissions"""
# Verify service account is authorized
if not is_service_account(current_identity):
raise Error("Only service accounts can impersonate")
# Create impersonation context
with impersonate(user_id) as context:
# All authorization checks use user_id's permissions
# But audit logs show both identities
result = execute_task(task, context)
return result
Delegation and impersonation enable flexible access.
Express authorization rules as policies, not procedures
Traditional procedural approach:
def can_access(user, resource, action):
if user.role == "admin":
return True
if resource.owner == user.id:
return True
if user.department == resource.department:
if action == "read":
return True
if action == "write" and user.seniority > 2:
return True
# Complex nested logic continues...
return FalsePolicy as code (declarative):
# policies.yaml
policies:
- id: admin-full-access
description: "Admins can do anything"
effect: allow
subjects: ["role:admin"]
actions: ["*"]
resources: ["*"]
- id: owner-full-access
description: "Owners control their resources"
effect: allow
subjects: ["user:*"]
actions: ["*"]
resources: ["*"]
condition: "resource.owner == subject.id"
- id: department-read-access
description: "Same department can read"
effect: allow
subjects: ["user:*"]
actions: ["read"]
resources: ["*"]
condition: "resource.department == subject.department"
- id: senior-write-access
description: "Senior staff can write in department"
effect: allow
subjects: ["user:*"]
actions: ["write"]
resources: ["*"]
condition: |
resource.department == subject.department AND
subject.seniority > 2Policy evaluation engine:
class PolicyEngine:
def evaluate(self, subject, action, resource, context):
applicable_policies = self.find_matching_policies(
subject, action, resource
)
# Explicit deny overrides allow
for policy in applicable_policies:
if policy.effect == "deny":
return False, f"Denied by {policy.id}"
# Any allow grants access
for policy in applicable_policies:
if policy.effect == "allow":
return True, f"Allowed by {policy.id}"
# Default deny
return False, "No matching allow policy"
Declarative policies separate logic from implementation.
AWS IAM: Policy-based with principals and resources
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::123456:user/alice"},
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::ml-models/*",
"Condition": {
"IpAddress": {"aws:SourceIp": "10.0.0.0/8"},
"DateGreaterThan": {"aws:CurrentTime": "2024-01-01"}
}
}]
}GCP IAM: Role bindings at resource level
# Binding roles to identities on resources
bindings:
- role: roles/storage.objectViewer
members:
- user:alice@example.com
- serviceAccount:ml-trainer@project.iam
resource: projects/my-project/buckets/ml-models
- role: roles/ml.modelUser
members:
- group:ml-team@example.com
condition:
expression: request.time > timestamp("2024-01-01")Azure RBAC: Scope-based assignments
# Role assignment at different scopes
assignment = {
"roleDefinitionId": "/subscriptions/sub123/providers/Microsoft.Authorization/roleDefinitions/contributor",
"principalId": "user-guid-123",
"scope": "/subscriptions/sub123/resourceGroups/ml-resources"
}
# Permissions inherit down: Subscription → Resource Group → ResourceCommon patterns across providers:

Cloud providers use variations of policy-based access.
Different clients need different data from same resources
REST endpoint returns fixed structure:
GET /api/users/123
{
"user_id": 123,
"email": "alice@example.com",
"name": "Alice Chen",
"profile_image": "base64...[2MB]",
"preferences": {...50 fields...},
"activity_history": [...200 entries...],
"connected_devices": [...},
"subscription": {...},
"recommendations": [...}
}Each client uses different subset:
Mobile app needs:
name, profile_image (thumbnail)Admin dashboard needs:
email, subscription, activity_historyAnalytics service needs:
user_id, preferences.languageREST over-fetches:
REST solutions are inadequate:
/users/123?fields=name,email (non-standard)/users/123/mobile, /users/123/admin (proliferation)
Different clients require different subsets of data.
GraphQL lets clients specify exactly what data they need
Instead of multiple REST calls:
# REST: 3 requests, 3 round trips
user = GET('/users/123')
posts = GET('/users/123/posts?limit=5')
for post in posts:
comments = GET(f'/posts/{post.id}/comments?limit=2')Single GraphQL query:
query GetUserWithPosts {
user(id: 123) {
name
email
posts(limit: 5) {
title
createdAt
comments(limit: 2) {
text
author {
name
}
}
}
}
}Response matches query structure exactly:
{
"data": {
"user": {
"name": "Alice Chen",
"email": "alice@example.com",
"posts": [
{
"title": "GraphQL Benefits",
"createdAt": "2024-01-15",
"comments": [
{
"text": "Great post!",
"author": {"name": "Bob"}
}
]
}
]
}
}
}Key differences from REST:
POST /graphql for everything
GraphQL fetches related data in single request.
Everything in GraphQL has a type
Schema definition:
type User {
id: ID! # ! means non-null
name: String!
email: String!
posts: [Post!]! # Array of Posts (never null)
friendCount: Int
accountType: AccountType! # Enum type
}
type Post {
id: ID!
title: String!
content: String
author: User! # Relationship to User
comments: [Comment!]!
likes: Int!
}
enum AccountType {
FREE
PREMIUM
ENTERPRISE
}
type Query {
user(id: ID!): User # Can return null if not found
users(limit: Int = 10): [User!]!
}
type Mutation {
createUser(input: CreateUserInput!): User!
deleteUser(id: ID!): Boolean!
}Type system provides:
Query validation example:

Type system provides safety and tooling.
GraphQL separates reads from writes explicitly
Query: Read operations (no side effects)
Mutation: Write operations (changes state)
mutation CreatePost {
createPost(input: {
title: "GraphQL Benefits"
content: "..."
authorId: 123
}) {
id # Return created post
title
publishedAt
author {
name
}
}
}Serial execution prevents race conditions:
mutation TransferFunds {
# These execute in order, not parallel
withdraw(account: "A", amount: 100) { balance }
deposit(account: "B", amount: 100) { balance }
}Convention: Mutations return the modified object so client can update its cache without refetching.

Queries parallelize; mutations serialize.
GraphQL’s flexibility creates performance challenges
Query requests users and their posts:
Naive resolver implementation:
def resolve_users(limit):
# 1 query
return db.query("SELECT * FROM users LIMIT ?", limit)
def resolve_posts(user):
# Called for EACH user (N queries)
return db.query("SELECT * FROM posts WHERE user_id = ?", user.id)
# Total: 1 + 100 = 101 database queries!Problem scales with nesting:
Solution: DataLoader pattern (batching)
# Collects all user IDs, makes single query
post_loader = DataLoader(batch_load_posts)
def batch_load_posts(user_ids):
# Single query for all users
posts = db.query(
"SELECT * FROM posts WHERE user_id IN (?)",
user_ids
)
# Group by user_id and return in order
return group_by_user(posts)
# Now: 1 + 1 = 2 queries totalMeasured impact:

DataLoader batches queries to prevent N+1 problem.
GraphQL’s flexibility enables malicious queries
Innocent-looking query with exponential cost:
query MaliciousQuery {
users(limit: 100) {
posts {
comments {
author {
posts {
comments {
author {
posts {
title
}
}
}
}
}
}
}
}
}Query analysis:
Single query can overwhelm server.
Protection mechanisms:
# Assign cost to each field
complexity = users(100) * 10 + posts * 5 + comments * 2
if complexity > 1000:
return Error("Query too complex")
Nested queries can create exponential load.
REST caching is straightforward
GraphQL breaks traditional caching
All queries go to single endpoint:
POST /graphql
{"query": "{ user(id: 123) { name } }"}
POST /graphql
{"query": "{ user(id: 123) { name email } }"}Same user, different queries, same URL.
Why POST breaks caching:
GraphQL caching strategies:

POST requests and dynamic queries prevent HTTP caching.
GraphQL changes fundamental assumptions about APIs
Unified query interface:
# Single endpoint handles all queries
POST /graphql
query GetDashboardData {
user(id: 123) {
name
recentPosts(limit: 3) {
title
comments(limit: 1) {
text
}
}
}
}Contrast with REST equivalent:
GET /users/123 # User data
GET /users/123/posts # User's posts
GET /posts/456/comments # Comments for each post
GET /posts/789/comments
GET /posts/012/commentsPerformance characteristics:
GraphQL advantages:
GraphQL costs:
Error handling differences:
REST: HTTP status codes indicate error types
GraphQL: Always returns 200 with error details

Each approach optimizes different aspects of API interaction.
GraphQL’s flexibility creates new complexity
Simple REST endpoint:
@app.route('/users/<int:user_id>')
def get_user(user_id):
user = User.query.get_or_404(user_id)
return jsonify(user.to_dict())Equivalent GraphQL implementation:
# Schema definition
type_defs = """
type User {
id: ID!
name: String!
posts: [Post!]!
}
type Query {
user(id: ID!): User
}
"""
# Resolver with N+1 protection
def resolve_user(obj, info, id):
return User.query.get(id)
def resolve_posts(user, info):
# Without DataLoader: N+1 problem
# With DataLoader: Complex batching logic
return post_loader.load(user.id)
# Query complexity analysis
def analyze_query_complexity(query_ast):
complexity = 0
for field in query_ast.selections:
complexity += calculate_field_cost(field)
if complexity > MAX_QUERY_COST:
raise GraphQLError("Query too complex")
return complexityOperational complexity increases:
Monitoring REST:
Monitoring GraphQL:

Flexibility introduces operational complexity.
Network calls introduce unpredictable delays
Single process function call:
Distributed service call:
def calculate_score(data):
response = requests.post('http://ml-service/predict',
json=data) # ??? ms, unpredictable
return response.json()Sources of unpredictability:
Timeouts cascade through service chains:
Service A calls Service B calls Service C:
# Service A: 30 second timeout
response_b = requests.get(url_b, timeout=30)
# Service B: 30 second timeout
response_c = requests.get(url_c, timeout=30)
# Service C: Takes 25 seconds to respondWhat happens:
Timeout strategies must coordinate across service boundaries
Hierarchical timeouts:
# Service A: Generous timeout for user-facing request
timeout_a = 10.0 # 10 seconds
# Service B: Leaves buffer for processing
timeout_b = 8.0 # 8 seconds
# Service C: Tightest timeout for backend
timeout_c = 6.0 # 6 secondsEach layer reserves time for its own processing.

Coordinated timeouts prevent cascade failures.
Different phases of network communication have different failure modes
Connection timeout: Establishing TCP connection
import socket
import requests
# Connection timeout: How long to wait for TCP handshake
requests.get('http://api.service.com/data',
timeout=(3, 30)) # (connect, read)
# ↑
# 3 seconds to establish connectionConnection establishment steps:
Typical connection timeout: 3-10 seconds
Read timeout: Waiting for response
# Read timeout: How long to wait for response after connection
requests.get('http://api.service.com/data',
timeout=(3, 30)) # (connect, read)
# ↑
# 30 seconds for complete responseWhy separate timeouts matter:
Connection timeout failures indicate:
Read timeout failures indicate:
Retry strategy depends on timeout type:
def call_service(url, data, max_retries=3):
for attempt in range(max_retries):
try:
response = requests.post(url, json=data,
timeout=(3, 30))
return response.json()
except requests.ConnectTimeout:
# Connection failed - service likely down
# Retry immediately (fail fast)
continue
except requests.ReadTimeout:
# Request sent but no response
# Longer backoff (service may be overloaded)
time.sleep(2 ** attempt)
continue
raise ServiceUnavailableError()
Different timeout types indicate different failure modes.
Not all failures should trigger retries
Immediate retry (no backoff):
def immediate_retry(func, max_attempts=3):
for attempt in range(max_attempts):
try:
return func()
except ConnectionError:
# Network connectivity issue - retry immediately
if attempt == max_attempts - 1:
raise
continue
# Use for: Connection failures, DNS timeoutsExponential backoff with jitter:
import random
import time
def exponential_backoff_retry(func, max_attempts=5):
for attempt in range(max_attempts):
try:
return func()
except (ReadTimeout, ServerError) as e:
if attempt == max_attempts - 1:
raise
# Base delay: 2^attempt seconds
delay = 2 ** attempt
# Add jitter to prevent thundering herd
jitter = random.uniform(0, 0.1 * delay)
total_delay = delay + jitter
time.sleep(total_delay)
continue
# Retry sequence: 1s, 2s, 4s, 8s, 16s (with jitter)Fixed interval retry:
def fixed_interval_retry(func, interval=5, max_attempts=3):
for attempt in range(max_attempts):
try:
return func()
except ServiceUnavailableError:
if attempt == max_attempts - 1:
raise
time.sleep(interval) # Always wait 5 seconds
# Use for: Known service restart windowsWhen NOT to retry:
def should_retry(exception, response=None):
# Never retry these conditions
if isinstance(exception, AuthenticationError):
return False # 401 - bad credentials
if isinstance(exception, AuthorizationError):
return False # 403 - insufficient permissions
if response and response.status_code == 400:
return False # Bad request - won't improve
if response and response.status_code == 404:
return False # Not found - resource doesn't exist
# Retry these conditions
if isinstance(exception, (ConnectionError, ReadTimeout)):
return True # Transient network issues
if response and response.status_code in [500, 502, 503, 504]:
return True # Server errors - may recover
return False
Choose retry strategy based on failure type and system constraints.
Retries are only safe when operations are idempotent
Problem: Non-idempotent operations
# Dangerous to retry - could double-charge customer
def charge_credit_card(customer_id, amount):
response = requests.post('https://payments.api/charge', {
'customer_id': customer_id,
'amount': amount,
'currency': 'USD'
})
# Network timeout after sending request
# Did the charge succeed? Unknown - timeout occurred before response
return response.json()
# Retry could result in:
charge_credit_card(123, 50.00) # $50 charged
# Timeout, retry...
charge_credit_card(123, 50.00) # Another $50 charged!Solution: Idempotency keys
import uuid
def charge_credit_card_safe(customer_id, amount, idempotency_key=None):
if not idempotency_key:
idempotency_key = str(uuid.uuid4())
response = requests.post('https://payments.api/charge', {
'customer_id': customer_id,
'amount': amount,
'currency': 'USD',
'idempotency_key': idempotency_key # Unique per logical operation
})
return response.json()
# Server implementation tracks processed keys:
def process_payment(request):
key = request.get('idempotency_key')
# Check if already processed
existing = db.query("SELECT * FROM payments WHERE idempotency_key = ?", key)
if existing:
return existing.response # Return same result as before
# Process payment
result = charge_card(request)
# Store result with key
db.execute("INSERT INTO payments (idempotency_key, response) VALUES (?, ?)",
key, result)
return resultIdempotency patterns:
# GET requests
user = get_user(123) # Always safe to repeat
# PUT requests (full replacement)
update_user(123, {"name": "Alice", "email": "alice@example.com"})def update_counter(counter_id, expected_value, new_value):
result = db.execute("""
UPDATE counters
SET value = ?
WHERE id = ? AND value = ?
""", new_value, counter_id, expected_value)
if result.rowcount == 0:
raise ConflictError("Counter was modified")
return new_value
Idempotency keys enable safe retries of financial operations.
Stop calling failing services to prevent resource exhaustion
Problem: Cascading failures
# Service A keeps trying to call failing Service B
def get_user_recommendations(user_id):
for attempt in range(5): # Keep retrying
try:
# Service B is down - this will always fail
response = requests.get(f'http://ml-service/recommend/{user_id}',
timeout=30)
return response.json()
except Exception:
time.sleep(2) # Wasting time and resources
continue
raise ServiceUnavailableError()
# Results in:
# - 5 × 30 second timeouts = 2.5 minutes per user
# - Thread pool exhaustion
# - Memory leak from pending requests
# - Service A becomes unavailable tooCircuit breaker solution:
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing fast
HALF_OPEN = "half_open" # Testing recovery
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.failure_count = 0
self.last_failure_time = None
self.state = CircuitState.CLOSED
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time < self.recovery_timeout:
raise CircuitBreakerOpenError("Service unavailable")
else:
self.state = CircuitState.HALF_OPEN
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
# Usage with circuit breaker
ml_service_breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=30)
def get_user_recommendations_safe(user_id):
try:
return ml_service_breaker.call(
lambda: requests.get(f'http://ml-service/recommend/{user_id}',
timeout=5).json()
)
except CircuitBreakerOpenError:
# Return cached recommendations or default
return get_default_recommendations(user_id)Circuit breaker states:
Metrics for circuit breaker tuning:

Circuit breaker prevents cascade failures by failing fast.
Combining timeouts, retries, and circuit breakers
Layered resilience strategy:
import asyncio
from typing import Optional, Callable, Any
class ResilientServiceClient:
def __init__(self, base_url: str):
self.base_url = base_url
self.circuit_breaker = CircuitBreaker(
failure_threshold=3,
recovery_timeout=30
)
self.session = requests.Session()
# Connection pooling for efficiency
self.session.mount('http://', requests.adapters.HTTPAdapter(
pool_connections=10, pool_maxsize=20
))
async def call_service(self,
endpoint: str,
data: Optional[dict] = None,
max_retries: int = 3) -> dict:
"""
Resilient service call with integrated patterns:
- Hierarchical timeouts
- Exponential backoff retries
- Circuit breaker protection
- Request tracing
"""
request_id = generate_request_id()
for attempt in range(max_retries + 1):
try:
# Circuit breaker check
if self.circuit_breaker.state == CircuitState.OPEN:
raise CircuitBreakerOpenError(
f"Circuit breaker open for {self.base_url}"
)
# Calculate timeout (shorter on retries)
connect_timeout = 3.0
read_timeout = max(10.0 - (attempt * 2), 5.0)
start_time = time.time()
response = await self._make_request(
endpoint, data, request_id,
timeout=(connect_timeout, read_timeout)
)
# Success - reset circuit breaker
self.circuit_breaker._on_success()
# Log success metrics
duration = time.time() - start_time
self._log_request(request_id, endpoint, attempt, duration, "success")
return response.json()
except requests.exceptions.ConnectTimeout:
# Connection timeout - retry immediately
self._log_request(request_id, endpoint, attempt, None, "connect_timeout")
if attempt < max_retries:
continue
raise ServiceUnavailableError("Connection timeout")
except requests.exceptions.ReadTimeout:
# Read timeout - exponential backoff
self.circuit_breaker._on_failure()
self._log_request(request_id, endpoint, attempt, None, "read_timeout")
if attempt < max_retries:
backoff_time = (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(backoff_time)
continue
raise ServiceUnavailableError("Read timeout")
except requests.exceptions.HTTPError as e:
if e.response.status_code >= 500:
# Server error - retry with backoff
self.circuit_breaker._on_failure()
if attempt < max_retries:
backoff_time = (2 ** attempt) + random.uniform(0, 1)
await asyncio.sleep(backoff_time)
continue
else:
# Client error - don't retry
raise
raise ServiceUnavailableError(f"Max retries exceeded for {endpoint}")
def _log_request(self, request_id: str, endpoint: str,
attempt: int, duration: Optional[float], status: str):
"""Structured logging for debugging and monitoring"""
logger.info({
"request_id": request_id,
"service": self.base_url,
"endpoint": endpoint,
"attempt": attempt,
"duration_ms": duration * 1000 if duration else None,
"status": status,
"circuit_breaker_state": self.circuit_breaker.state.value
})Real-world timeout hierarchy example:

Integrated patterns provide comprehensive resilience.
Some operations take too long for synchronous HTTP
Typical HTTP request/response works for fast operations:
# Fast operation: 50ms
response = requests.get('https://api.service.com/users/123')
user = response.json() # Works fineLong-running operations break this model:
# Video transcoding: 5 minutes
response = requests.post('https://api.service.com/transcode',
json={'video_url': 'input.mp4'},
timeout=300) # Wait 5 minutes?
# Problems:
# - Client connection held open entire time
# - Network interruption loses everything
# - No progress visibility
# - Client can't do anything elseCore problem: Need to decouple submission from completion
Three solutions exist, each with different trade-offs:
All three share the same pattern: Submit job → get job_id → retrieve result later. They differ in how the result is retrieved.
Pattern comparison at a glance:
Polling - Simple but wasteful:
job_id = submit_job()
while not done:
status = check_status(job_id) # Repeated HTTP requests
time.sleep(5) # Most return "not done yet"Webhooks - Efficient but complex setup:
job_id = submit_job(callback_url='https://my-app.com/done')
# Server POSTs result to callback_url when complete
# No wasted requests, but client needs public endpointWebSockets - Real-time but resource-intensive:

All three patterns decouple submission from completion.
Polling: Client-driven status checks
Submit once, check repeatedly:
# Submit → get job_id immediately
job_id = submit_job({'operation': 'transcode', 'input': 'video.mp4'})
# Poll until complete
while True:
status = check_status(job_id)
if status['complete']:
return status['result']
time.sleep(5) # Wait and try againServer tracks job state:
jobs["abc-123"] = {
"status": "processing", # pending → processing → completed/failed
"progress": 45,
"result": None
}Webhooks: Server-driven notifications
Submit with callback URL:
# Client submits with callback URL
job_id = submit_job({
'operation': 'transcode',
'input': 'video.mp4',
'callback_url': 'https://my-app.com/webhooks/transcode'
})
# Client provides endpoint - server calls this when done
@app.post('/webhooks/transcode')
def handle_complete(request):
data = request.json() # {job_id, status, result}
update_database(data['job_id'], data['result'])Server notifies client:
# When job completes, POST to client's callback_url
requests.post(callback_url, json={'job_id': job_id, 'result': result})Trade-offs comparison:
| Aspect | Polling | Webhooks |
|---|---|---|
| Efficiency | Wasteful (most checks return “not ready”) | Efficient (one notification) |
| Latency | poll_interval/2 average | Immediate |
| Client requirements | Simple HTTP client | Public endpoint required |
| Firewall-friendly | Yes (outbound only) | No (needs inbound) |
| Reliability | Client controls retry | Server must retry failed deliveries |
When to use:

Polling is simple but wasteful; webhooks are efficient but require public endpoints.
Polling and webhooks handle discrete operations
Submit job → wait → get result. One submission, one result.
WebSockets handle continuous streams
# Connection stays open, updates flow continuously
ws.connect("wss://api.service.com/live")
ws.send({"subscribe": "job_updates"})
while True:
update = ws.recv() # Server pushes whenever state changes
# Progress: 25%, 50%, 75%, 100%
The connection itself is the communication channel, not individual HTTP requests.
Video transcoding (5 minutes)
Discrete: submit → wait → result
Live dashboard (updates every second)
Continuous: constant stream of values
Mobile app vs Backend service
Mobile can’t receive webhooks (no public endpoint):
Backend can expose endpoints:
Combining approaches for reliability:
# Webhook with polling fallback
job_id = submit_job(callback='https://my-app.com/webhook')
result = wait_for_webhook(timeout=300) or poll_until_done(job_id)Webhook efficiency when network is reliable, polling safety when it isn’t.
Problem: API works in Postman, fails in browser
// JavaScript in browser at http://localhost:3000
fetch('http://localhost:5000/predict', {
method: 'POST',
body: JSON.stringify({features: [1, 2, 3]})
})
// Error: CORS policy: No 'Access-Control-Allow-Origin' headerSame-origin policy - Browser security restriction:
Examples:
http://localhost:3000 → http://localhost:5000 Blocked - Different ports
https://app.example.com → https://api.example.com Blocked - Different subdomains
https://app.example.com → https://app.example.com Allowed - Same origin
Not an API problem - browser enforces this
Postman bypasses CORS (not a browser) curl bypasses CORS (not a browser) Browser JavaScript cannot bypass CORS

Browser sends preflight OPTIONS request before actual request
OPTIONS /predict HTTP/1.1
Host: localhost:5000
Origin: http://localhost:3000
Access-Control-Request-Method: POST
Access-Control-Request-Headers: Content-TypeServer must respond with permission headers:
HTTP/1.1 200 OK
Access-Control-Allow-Origin: http://localhost:3000
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization
Access-Control-Max-Age: 3600Then browser sends actual request:
POST /predict HTTP/1.1
Host: localhost:5000
Origin: http://localhost:3000
Content-Type: application/json
{"features": [1, 2, 3]}Flask implementation:
from flask_cors import CORS
app = Flask(__name__)
CORS(app, origins=['http://localhost:3000'])
# Or manual headers
@app.after_request
def add_cors_headers(response):
response.headers['Access-Control-Allow-Origin'] = 'http://localhost:3000'
response.headers['Access-Control-Allow-Headers'] = 'Content-Type'
return responseTracing requests across multiple services requires unique identifiers
Three services generating thousands of log entries:
# Gateway logs (10,000 entries)
[14:23:01.123] Processing request
[14:23:01.134] Processing request
[14:23:01.145] Processing request
# User Service logs (5,000 entries)
[14:23:01.234] Database query
[14:23:01.245] Database query
[14:23:01.256] Database query failed
# Payment Service logs (8,000 entries)
[14:23:01.345] Processing payment
[14:23:01.356] Processing paymentWithout correlation: Cannot identify which entries belong to same request
With correlation ID: Thread unique identifier through all services
# Generate at API entry point
@app.before_request
def assign_request_id():
request_id = request.headers.get('X-Request-ID', str(uuid.uuid4()))
g.request_id = request_id
# Forward to downstream services
headers = {
'X-Request-ID': g.request_id,
'Authorization': get_token()
}
response = requests.post(user_service_url, headers=headers)
# Include in every log message
logger.info(f"[{g.request_id}] User {user_id} query failed")Structured logging: JSON format, not text strings
# Bad: Text logs hard to parse
logger.info(f"User {user_id} made prediction, took {duration}ms")
# Good: Structured JSON logs
logger.info(json.dumps({
"timestamp": "2024-01-15T10:30:45Z",
"level": "INFO",
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"user_id": 123,
"endpoint": "POST /predict",
"duration_ms": 247,
"status_code": 200
}))Why JSON:
jqWhat to log:
request_id - Correlation across servicesuser_id - Which user affectedendpoint - What operationduration_ms - How long it tookstatus_code - Success or failureerror_message - What went wrong (if failed)What NOT to log:
API Gateway sits between clients and backend services
Why gateway: Implement cross-cutting concerns once, not in every service
Six core functions:
1. Authentication/Authorization
2. Rate Limiting
3. Request Routing
4. Response Caching
5. Monitoring/Analytics
6. CORS Headers

Without gateway:
With gateway:
AWS API Gateway - Managed service, no servers to run
Endpoint structure:
https://{api-id}.execute-api.{region}.amazonaws.com/{stage}/{resource}
https://abc123.execute-api.us-east-1.amazonaws.com/prod/predict
↑ ↑ ↑ ↑
API ID Region Stage Resource
Configuration components:
Resources - URL paths
/users/predict/models/{id}Methods - HTTP operations per resource
Integration - Backend target
Stages - Environment separation
prod - Production trafficstaging - Pre-production testingdev - Development environmentEach stage has independent configuration

Usage plans - Rate limits per API key:
# Create usage plan
{
"name": "Basic Plan",
"throttle": {
"rateLimit": 100, # requests/second
"burstLimit": 200 # burst capacity
},
"quota": {
"limit": 10000, # requests
"period": "DAY" # per day
}
}Pricing:
Complete request flow through AWS API Gateway
1. Client makes request
curl -X POST \
https://abc123.execute-api.us-east-1.amazonaws.com/prod/predict \
-H 'x-api-key: 8fk3jsl9dkfj3k4j' \
-H 'Content-Type: application/json' \
-d '{"features": [1.2, 3.4, 5.6]}'2. API Gateway validates API key
3. API Gateway checks usage plan quota
4. API Gateway routes to backend
5. Backend processes request
def lambda_handler(event, context):
features = json.loads(event['body'])['features']
prediction = model.predict(features)
return {
'statusCode': 200,
'body': json.dumps({'prediction': float(prediction)})
}6. API Gateway logs to CloudWatch