API Design

# User management module
def create_user(email, password):
    user_id = generate_id()
    hash_pwd = hash_password(password)
    store_user(user_id, email, hash_pwd)
    return user_id

# Booking module
def create_booking(user_id, flight_id):
    user = get_user(user_id)  # Function call
    if user.is_active:
        return store_booking(user_id, flight_id)

Module boundaries provide:

Separation of concerns: User logic isolated from booking logic
Independent modification: Change password hashing without touching bookings
Contract-based interface: get_user(user_id) → User defines expectations
Team organization: Different developers work on different modules

Single process limitation:

Shared memory space: Module bug can crash entire application
Shared resources: CPU-intensive user validation blocks booking requests
Deployment coupling: Update user module requires redeploying bookings
Technology lock-in: All modules must use same language/framework

Function calls couple modules in same process

Process Isolation - Separate Failure Domains

Moving from modules to separate processes

Same code, different execution model:

# User service (separate process)
# Listens on port 8001
@app.route('/users', methods=['POST'])
def create_user():
    email = request.json['email']
    password = request.json['password']
    user_id = generate_id()
    hash_pwd = hash_password(password)
    store_user(user_id, email, hash_pwd)
    return {'user_id': user_id}

# Booking service (separate process)
# Listens on port 8002
@app.route('/bookings', methods=['POST'])
def create_booking():
    user_id = request.json['user_id']
    flight_id = request.json['flight_id']

    # HTTP request instead of function call
    response = requests.get(f'http://localhost:8001/users/{user_id}')
    user = response.json()

    if user['is_active']:
        return store_booking(user_id, flight_id)

Why separate processes:

Failure isolation: User service crash doesn’t terminate booking service
Resource isolation: CPU-intensive user validation doesn’t block booking requests
Independent deployment: Update user service without restarting booking service
Technology flexibility: User service in Python, booking service in Go

Process boundaries isolate failures

API Contract - Defining Service Boundaries

API: Application Programming Interface - contract for communication

Function call contract:

def get_user(user_id: int) -> User:
    """
    Contract:
    - Input: user_id (integer)
    - Output: User object with fields: id, email, is_active
    - Raises: UserNotFoundError if user_id doesn't exist
    """
    pass

HTTP API contract for same operation:

Request: GET /users/123 on host user-service:8001

Success response: HTTP 200 OK

{
  "user_id": 123,
  "email": "alice@example.com",
  "is_active": true
}

Not found response: HTTP 404 Not Found

{
  "error": "User not found",
  "user_id": 123
}

API contract specifies:

Endpoint: GET /users/123 identifies resource and operation
Input format: User ID in URL path
Success response: JSON with user_id, email, is_active fields (status 200)
Error response: JSON with error message (status 404)

Why explicit contracts matter:

Different teams can work independently:

Booking team knows user API returns is_active field
User team can change implementation (database, caching) without breaking bookings
Contract violations detected immediately: 404 instead of silent failure

API documentation as contract:

GET /users/:user_id — Retrieve user by ID

Parameters:

user_id (integer, path, required): User identifier

Responses:

200 User found: Returns user_id (integer), email (string), is_active (boolean)
404 Not found: Returns error (string), user_id (integer)
500 Server error: Internal error occurred

Contract enforcement:

Booking service expects specific JSON structure
User service must provide that structure
Change requires coordination: Update contract, then implementation
Compare to function call: Type system enforces contract at compile time

APIs make implicit function contracts explicit and enforceable

Independent Evolution - Versioning and Breaking Changes

Scenario: User service needs to add email verification

Version 1 response: GET /users/123

{
  "user_id": 123,
  "email": "alice@example.com",
  "is_active": true
}

Version 2 - Adding fields (backward compatible)

{
  "user_id": 123,
  "email": "alice@example.com",
  "is_active": true,
  "email_verified": true,           // New field
  "verification_date": "2025-01-15" // New field
}

Backward compatible change:

Booking service ignores unknown fields
Continues using is_active as before
No coordination required

Version 2 - Breaking change (not compatible)

{
  "user_id": 123,
  "email": "alice@example.com",
  "account_status": "active_verified"  // Replaced is_active
}

Problem: Booking service still reads is_active field

Field doesn’t exist in response
Booking service interprets as false or crashes
Breaking change requires coordinated deployment

Version management strategies:

URL-based versioning:

GET /v1/users/123 → Old response (includes is_active)
GET /v2/users/123 → New response (includes account_status)
Booking service continues using /v1/users
New services can use /v2/users
User service maintains both versions temporarily

Version distribution (airline system, 45 days after v2 launch):

v1 endpoint: 12% of requests (older services)
v2 endpoint: 88% of requests (updated services)

Cannot remove v1 until 100% migrated

Why versioning needed:

20 services depend on user API
Coordinated update across 20 teams: weeks of planning
Independent updates with versioning: gradual rollout
Compare to function signature change: Compiler forces simultaneous update

APIs enable independent deployment through versioning

Multiple Clients - Contract Stability

API serves multiple independent consumers

Four clients calling GET /users/123:

Booking service — Checks is_active before creating booking
Email service — Sends email to user['email']
Admin dashboard — Displays user profile
Mobile app — Renders profile screen (external client via https://api.airline.com)

All four clients depend on same contract

Client code example:

response = requests.get('http://user-service:8001/users/123')
user = response.json()
if user['is_active']:
    create_booking(...)

Internal change in user service:

# Original: Users stored in PostgreSQL
def get_user(user_id):
    row = db.query("SELECT * FROM users WHERE id = ?", user_id)
    return {
        'user_id': row['id'],
        'email': row['email'],
        'is_active': row['active']
    }

# New: Users moved to Redis cache (performance improvement)
def get_user(user_id):
    cached = redis.get(f'user:{user_id}')
    if cached:
        return json.loads(cached)
    # Fallback to database...

Impact on clients: None

API contract unchanged: Still returns same JSON structure
Implementation details hidden behind API
Database → Redis migration invisible to consumers
No client code changes required

Contract violation example:

User service developer changes field name:

# Accidentally changed field name
return {
    'user_id': row['id'],
    'email_address': row['email'],  # Was 'email'
    'is_active': row['active']
}

Cascading failures:

Booking service: KeyError: 'email' when sending confirmation
Email service: Crashes attempting to read user email
Admin dashboard: Profile page displays blank email
Mobile app: 500 errors rendered to users

4 clients break simultaneously from single field rename

Why contracts matter with multiple clients:

Function rename in module: Compiler catches all call sites
API field rename: No compile-time check, runtime failures
More clients = higher cost of breaking changes
Explicit contract + automated testing prevents accidental breakage

APIs require stability when serving multiple independent clients

The HTTP Protocol

HTTP Request Structure - Client to Server

HTTP request anatomy:

GET /users/123 HTTP/1.1
Host: user-service.airline.com
Authorization: Bearer eyJhbGc...
Accept: application/json
User-Agent: booking-service/2.1.0

Request line components:

Method: GET - what operation to perform
Path: /users/123 - which resource to access
Protocol: HTTP/1.1 - version of HTTP

Request headers (metadata):

Host: Which server to route to (required in HTTP/1.1)
Authorization: Credentials for authentication
Accept: What response format client understands
User-Agent: Identifies client making request

Headers are key-value pairs: Header-Name: value

Empty line separates headers from body

Requests without body (GET, DELETE) end after headers

Measured request size:

Typical GET request: 250-400 bytes
Headers: 200-350 bytes
Request line: 50 bytes

Request sent as plain text over TCP

HTTP Response Structure - Server to Client

HTTP response anatomy:

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 145
Cache-Control: max-age=300
Date: Mon, 15 Jan 2025 14:30:00 GMT

{
  "user_id": 123,
  "email": "alice@example.com",
  "is_active": true,
  "created_at": "2024-01-10T08:00:00Z"
}

Status line components:

Protocol: HTTP/1.1
Status code: 200 - numeric result indicator
Reason phrase: OK - human-readable description

Response headers:

Content-Type: Format of response body (JSON, HTML, etc)
Content-Length: Body size in bytes
Cache-Control: How long response can be cached
Date: When response was generated

Response body:

Actual data returned by server
Format specified by Content-Type header
In this case: JSON with user data

Empty line separates headers from body (same as request)

Response mirrors request structure

Status Codes Determine Client Behavior

Status code tells client what happened and what to do next

response = requests.get('http://user-service/users/123')

if response.status_code == 200:
    user = response.json()  # Success - process data
    
elif response.status_code == 404:
    return None  # User doesn't exist - normal case
    
elif response.status_code == 401:
    refresh_token()  # Get new auth token
    retry_request()   # Try again
    
elif response.status_code == 503:
    time.sleep(5)     # Service down
    retry_request()   # Retry with backoff
    
elif response.status_code >= 500:
    alert_ops_team()  # Server problem
    return fallback_response()

Different codes require different handling:

2xx: Process response
4xx: Fix request or handle business logic
5xx: Retry or use fallback

Common status codes in production:

200 OK — Request succeeded
Return data in response body

201 Created — Resource created
Location header has new resource URL

204 No Content — Success, no data
DELETE succeeded, nothing to return

400 Bad Request — Malformed request
Invalid JSON, missing required field

401 Unauthorized — No valid auth
Token expired or missing

403 Forbidden — Not allowed
Valid auth but wrong permissions

404 Not Found — Resource missing
Normal for checking existence

429 Too Many Requests — Rate limited
Check Retry-After header

500 Internal Server Error — Bug
Unhandled exception in server

503 Service Unavailable — Overloaded
Retry with exponential backoff

4xx vs 5xx: Client Problem vs Server Problem

4xx = Your request has a problem

POST /users
Content-Type: application/json

{"email": "not-an-email", "age": "twenty"}

Response: 400 Bad Request

{
  "errors": [
    {"field": "email", "message": "Invalid email format"},
    {"field": "age", "message": "Must be integer"}
  ]
}

Client must fix the request:

Validate input before sending
Check required fields
Use correct data types

5xx = Server has a problem

# Server code with bug
@app.route('/users/<id>')
def get_user(id):
    user = db.query(f"SELECT * FROM users WHERE id = {id}")
    return user.to_dict()  # Crashes if user is None

Response: 500 Internal Server Error

{
  "error": "Internal server error",
  "request_id": "7f3c6b2a"
}

Client should retry (server might recover):

Use exponential backoff
Have circuit breaker
Log for debugging

Retry strategies differ:

4xx errors: Don’t retry same request

Fix the problem first
401: Get new token
429: Wait for rate limit reset

5xx errors: Retry might work

Server might recover
Different server might work
Use exponential backoff

429 and 503: Handling Overload

429 Too Many Requests — You’re sending too fast

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1697299200
Retry-After: 60

{
  "error": "Rate limit exceeded",
  "limit": 100,
  "window": "per hour",
  "retry_after_seconds": 60
}

Client must slow down:

if response.status_code == 429:
    retry_after = response.headers.get('Retry-After', 60)
    time.sleep(int(retry_after))
    # Or queue request for later

503 Service Unavailable — Server overloaded

HTTP/1.1 503 Service Unavailable
Retry-After: 30

Server is temporarily unable to handle requests:

Too many connections
Database down
Deployment in progress

Different causes, different handling:

429 = Rate limiting (intentional)

Protects service from abuse
Per-client limits
Predictable reset times
Client should queue/batch requests

503 = Overload (unintentional)

Service can’t handle load
Affects all clients
Unknown recovery time
Client should back off

Exponential backoff pattern:

def retry_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        response = func()
        if response.status_code == 503:
            wait = 2 ** attempt  # 1, 2, 4, 8, 16
            time.sleep(wait)
        else:
            return response
    raise Exception("Max retries exceeded")

Circuit breaker pattern:

After N failures, stop trying
Let service recover
Periodically test if service back

Methods Define What Happens to Resources

HTTP methods specify the operation type

GET — Read data

GET /users/123

Returns user 123’s data. No changes to server state.

POST — Create new

POST /users
{"email": "alice@example.com", "password": "..."}

Creates new user. Server assigns ID.

PUT — Replace entirely

PUT /users/123
{"email": "new@example.com", "is_active": false}

Replaces ALL fields of user 123.

PATCH — Update partially

PATCH /users/123
{"email": "new@example.com"}

Updates ONLY email, leaves other fields unchanged.

DELETE — Remove

DELETE /users/123

Removes user 123 from system.

Critical property: Idempotency

Idempotent = Same result from multiple identical calls

Method	Idempotent	Safe	Use Case
GET	Yes	Yes	Read data
POST	No	No	Create new
PUT	Yes	No	Replace all
PATCH	No	No	Update some
DELETE	Yes	No	Remove

Why idempotency matters:

Network fails after server processes but before client gets response.

Idempotent (PUT, DELETE):

Client can safely retry
No duplicate side effects

Not idempotent (POST):

Retry might create duplicate
Need idempotency keys

Safe = No server state changes
Only GET is safe (can cache, prefetch)

POST vs PUT: Creation Patterns

POST - Server assigns identifier

# Client doesn't know ID yet
POST /users
{
  "email": "alice@example.com",
  "name": "Alice"
}

# Server response
201 Created
Location: /users/456
{
  "id": 456,  # Server assigned
  "email": "alice@example.com",
  "name": "Alice",
  "created_at": "2024-01-15T10:30:00Z"
}

PUT - Client specifies identifier

# Client knows ID (e.g., using email as ID)
PUT /users/alice@example.com
{
  "name": "Alice",
  "role": "admin"
}

# Server response
200 OK  # Or 201 if newly created
{
  "id": "alice@example.com",
  "name": "Alice",
  "role": "admin"
}

POST is not idempotent:

# Call POST twice with same data
POST /users {"email": "bob@example.com"}
POST /users {"email": "bob@example.com"}

# Result: Two users created (IDs 457, 458)
# Or: 409 Conflict on second request

When to use each:

Use POST when:

Server generates IDs (auto-increment, UUID)
Creating dependent resources
Running actions/commands
Resource location unknown

Use PUT when:

Client controls IDs
Replacing entire resource
Upsert operations (create or update)
Resource location known

Real examples:

GitHub:

POST /repos/owner/repo/issues
# Creates issue, GitHub assigns number

PUT /repos/owner/repo/contents/README.md
# Creates/replaces file at exact path

AWS S3:

PUT /bucket/object-key
# Always PUT - client controls key
# Creates new or replaces existing

Idempotency in practice:

PUT same data multiple times = one resource
POST same data multiple times = multiple resources (or error)

PATCH vs PUT: Partial vs Full Updates

PUT replaces entire resource

# Current user state
{
  "id": 123,
  "email": "alice@example.com",
  "name": "Alice",
  "role": "user",
  "is_active": true
}

# PUT request (missing fields)
PUT /users/123
{
  "email": "alice@example.com",
  "name": "Alice Updated"
}

# Result - other fields lost/defaulted
{
  "id": 123,
  "email": "alice@example.com",
  "name": "Alice Updated",
  "role": null,      # Lost!
  "is_active": false  # Lost!
}

PATCH updates only specified fields

# PATCH request
PATCH /users/123
{
  "name": "Alice Updated"
}

# Result - other fields unchanged
{
  "id": 123,
  "email": "alice@example.com",  # Unchanged
  "name": "Alice Updated",        # Changed
  "role": "user",                 # Unchanged
  "is_active": true                # Unchanged
}

Common PATCH formats:

JSON Merge Patch (simple):

{
  "name": "New Name",
  "email": "new@example.com"
}

JSON Patch (RFC 6902):

[
  {"op": "replace", "path": "/name", "value": "New Name"},
  {"op": "add", "path": "/tags/0", "value": "premium"},
  {"op": "remove", "path": "/temp_field"}
]

When to use each:

PUT:

Form submissions (have all fields)
Config file updates
Immutable updates

PATCH:

Single field updates
Large resources
Partial forms
Mobile apps (bandwidth)

Common mistake: Using PUT for single field update loses data

# WRONG: PUT with one field
PUT /users/123
{"email": "new@example.com"}
# Lost all other fields!

Safe and Unsafe Methods: Retry Implications

Safe methods can be called without side effects

# Safe to retry, cache, prefetch
GET /users/123
GET /users/123  # Same result
GET /users/123  # Same result

# Browser/proxy can cache
Cache-Control: max-age=300

Unsafe methods change server state

# DELETE is idempotent but unsafe
DELETE /users/123  # Returns 204 No Content
DELETE /users/123  # Returns 404 Not Found
DELETE /users/123  # Returns 404 Not Found
# Final state same, but state did change

# POST is neither safe nor idempotent  
POST /orders       # Creates order 1
POST /orders       # Creates order 2 (duplicate!)
POST /orders       # Creates order 3 (duplicate!)

Network failure handling:

try:
    response = requests.post('/orders', data)
except requests.Timeout:
    # Did server process request before timeout?
    # Can't know - need idempotency key

Retry safety:

Always safe: GET
Safe if idempotent: PUT, DELETE
Dangerous: POST, PATCH

Need idempotency keys for POST/PATCH

Request with Body - POST Example

Creating new booking via POST:

POST /bookings HTTP/1.1
Host: booking-service.airline.com
Content-Type: application/json
Content-Length: 215
Authorization: Bearer eyJhbGc...

{
  "user_id": 123,
  "flight_id": 456,
  "seat": "12A",
  "payment": {
    "method": "credit_card",
    "amount": 450.00,
    "currency": "USD"
  },
  "notifications": {
    "email": true,
    "sms": false
  }
}

Additional headers for body:

Content-Type: Specifies body format (JSON, XML, form data)
Content-Length: Exact size in bytes (required by HTTP/1.1)

Server response:

HTTP/1.1 201 Created
Location: /bookings/789
Content-Type: application/json
Content-Length: 87

{
  "booking_id": 789,
  "status": "confirmed",
  "confirmation_code": "ABC123"
}

201 Created status indicates:

New resource successfully created
Location header provides URL to access new resource
Response body contains resource details

POST request includes data in body

Connection Lifecycle - TCP Under HTTP

HTTP runs over TCP connection:

1. TCP handshake (connection establishment):

Client            Server
  |                 |
  |--- SYN -------->| (50ms)
  |<-- SYN-ACK -----| (50ms)
  |--- ACK -------->| (50ms)
  |                 |
  [TCP established]

3-way handshake establishes connection
Total latency: 150ms (client-server round trip)
Required before any HTTP data sent

2. HTTP request/response over established connection:

  |                 |
  |- GET /users/123->| (50ms)
  |<- 200 OK + data -| (50ms)
  |                 |

Request sent over TCP connection
Response returned on same connection
Total: 100ms for request/response

3. Connection close:

  |                   |
  |---- FIN --------->|
  |<--- FIN-ACK ------|
  |                   |
  [Connection closed]

Total measured latency for single request:

TCP handshake: 150ms
HTTP request/response: 100ms
Total: 250ms

Geographic impact (measurements):

Same datacenter: 1-2ms per round trip
Cross-coast US: 60-80ms per round trip
Transpacific: 150-200ms per round trip

3-way handshake before HTTP request

Connection Reuse - HTTP Keep-Alive

Problem: Creating new TCP connection for each request is expensive

Without keep-alive (HTTP/1.0 default):

Request 1:
  TCP handshake: 150ms
  HTTP request/response: 100ms
  Close connection
  Total: 250ms

Request 2:
  TCP handshake: 150ms (again!)
  HTTP request/response: 100ms
  Close connection
  Total: 250ms

Request 3:
  TCP handshake: 150ms (again!)
  HTTP request/response: 100ms
  Close connection
  Total: 250ms

Total for 3 requests: 750ms

With keep-alive (HTTP/1.1 default):

Request 1:
  TCP handshake: 150ms
  HTTP request/response: 100ms
  Keep connection open
  Total: 250ms

Request 2:
  HTTP request/response: 100ms
  (reuse connection)
  Total: 100ms

Request 3:
  HTTP request/response: 100ms
  (reuse connection)
  Total: 100ms

Total for 3 requests: 450ms

40% latency reduction by reusing connection

Keep-alive headers:

Request includes: Connection: keep-alive Response includes: Connection: keep-alive and Keep-Alive: timeout=5, max=1000

Keep-alive parameters:

timeout=5: Server keeps connection open for 5 seconds idle
max=1000: Maximum 1000 requests on this connection

Connection pooling in practice:

import requests

# Creates connection pool (default 10 connections)
session = requests.Session()

# All requests reuse connections from pool
for user_id in range(100):
    response = session.get(f'http://user-service:8001/users/{user_id}')
    # Connections automatically returned to pool

Measured improvement (100 requests):

Without keep-alive: 25 seconds (250ms × 100)
With keep-alive (10 connections in pool): 3.5 seconds
7× improvement

Connection reuse critical for performance

Connection Pooling: Managing Concurrent Requests

Problem: Service needs to handle many concurrent requests

Single connection serves requests sequentially:

Connection 1: [Req1]->[Resp1]->[Req2]->[Resp2]->[Req3]->[Resp3]

Connection pool serves requests in parallel:

Connection 1: [Req1]->[Resp1]        [Req4]->[Resp4]
Connection 2:        [Req2]->[Resp2]        [Req5]->[Resp5]  
Connection 3:               [Req3]->[Resp3]

Connection pool implementation:

from urllib3 import PoolManager

# Create pool with size limits
pool = PoolManager(
    num_pools=10,      # Max 10 different hosts
    maxsize=20,        # Max 20 connections per host
    block=True         # Wait if pool exhausted
)

# Connections managed automatically
response = pool.request('GET', 'http://api/users/123')
# Connection returned to pool after response read

Pool sizing considerations:

Too small: Requests wait for available connection
Too large: Memory overhead, server connection limits
Typical: 10-50 connections per host

Pool exhaustion behavior:

# Pool size: 2, but 3 concurrent requests
pool = PoolManager(maxsize=2)

# Thread 1: Gets connection
# Thread 2: Gets connection  
# Thread 3: Blocks waiting for available connection
# Thread 1 completes: Connection returned to pool
# Thread 3: Gets recycled connection

Real scenarios requiring pools:

Web server → Database (10-20 connections)
API Gateway → Backend services (50-100 per service)
Microservice → Other microservices (10-30 per service)

Headers Control Service Behavior Beyond Content

HTTP headers determine how services process requests

Four critical functions in distributed systems:

1. Authentication/Authorization Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

Service validates identity and permissions before processing

2. Content Negotiation Content-Type: application/json; charset=utf-8 Accept: application/json

Ensures correct parsing and response format

3. Request Correlation X-Request-ID: 7f3c6b2a-5d9e-4f8b-a1c3-9e8d7c6b5a4f

Traces requests across multiple services for debugging

4. Service Metadata User-Agent: booking-service/2.1.0 X-API-Version: 2

Enables version-specific handling and deprecation

What happens without proper headers:

Missing Authorization → 401 Unauthorized
Wrong Content-Type → Data corruption
No X-Request-ID → Can’t trace failures
Invalid Accept → Client can’t parse response

Headers every request needs:

Authorization — Identity and permissions
Content-Type — How to parse body
Accept — What format you want back

Headers for debugging:

X-Request-ID — Correlation across services
User-Agent — Which client sent this

Headers in responses:

Status code — Did it work?
X-RateLimit-Remaining — Quota status
Cache-Control — Can this be cached?

Headers are contracts between services

Request Tracing: Following Failures Across Services

Problem: Request fails somewhere in chain of services

User reports “booking failed” - what actually happened?

# Three services, thousands of concurrent requests
[14:23:01.123] booking-service: Processing request
[14:23:01.234] user-service: Database query failed
[14:23:01.345] payment-service: Processing payment
[14:23:01.456] booking-service: Request failed

# Which events are related to the user's failure?

Solution: Thread request ID through all services

# Generate ID at entry point
@app.before_request
def assign_request_id():
    request_id = request.headers.get('X-Request-ID',
                                    str(uuid.uuid4()))
    g.request_id = request_id
    
# Forward to downstream services
headers = {
    'X-Request-ID': g.request_id,
    'Authorization': get_token()
}
response = requests.get(user_service_url, headers=headers)

# Include in every log message
logger.info(f"[{g.request_id}] Processing user {user_id}")

Finding the failure:

grep "7f3c6b2a" *.log | sort
[14:23:01.123] [7f3c6b2a] booking: Request received
[14:23:01.234] [7f3c6b2a] user: Connection pool exhausted
[14:23:01.456] [7f3c6b2a] booking: Returning 500

Now you know: user service connection pool was exhausted

Authorization: Identity vs Permissions

JWT in Authorization header identifies service and permissions

Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

Decoded JWT contains:

{
  "sub": "booking-service",        // Who is calling
  "scopes": [                      // What they can do
    "read:users",
    "write:bookings"
  ],
  "exp": 1697295600,               // When token expires
  "iat": 1697292000                // When token was issued
}

Server validates every request:

def validate_request(request):
    auth_header = request.headers.get('Authorization')
    if not auth_header or not auth_header.startswith('Bearer '):
        return 401  # No identity provided
    
    token = auth_header[7:]  # Remove 'Bearer ' prefix
    try:
        payload = jwt.decode(token, PUBLIC_KEY, algorithms=['RS256'])
        if 'read:users' not in payload.get('scopes', []):
            return 403  # Identity valid, permission denied
        return None  # Success
    except jwt.ExpiredSignatureError:
        return 401  # Identity expired

401 vs 403 - Critical distinction:

401 Unauthorized — Identity problem

No Authorization header
Token expired
Invalid signature → Client should get new token and retry

403 Forbidden — Permission problem

Valid token, wrong scopes
Accessing other user’s resource → Client should not retry

Token expiration creates problems:

Long-running operation starts with valid token
Token expires during operation
Operation fails partway through

Common patterns:

Tokens expire after 1 hour
Refresh before starting long operations
Cannot revoke JWT before expiration
Compromised token stays valid until exp

This is why short expiration times matter

Content Negotiation: Preventing Silent Corruption

Content-Type tells server how to parse request body

POST /models/123/predict
Content-Type: application/json; charset=utf-8
Accept: application/json

{"features": [1.2, 3.4, 5.6], "threshold": 0.8}

Server uses Content-Type to route parsing:

@app.route('/models/<id>/predict', methods=['POST'])
def predict(id):
    content_type = request.headers.get('Content-Type', '')
    
    if 'application/json' in content_type:
        data = request.get_json()  # JSON parser
    elif 'application/x-www-form-urlencoded' in content_type:
        data = request.form        # Form parser
    elif 'multipart/form-data' in content_type:
        data = request.files       # File parser
    else:
        return {'error': 'Unsupported Content-Type'}, 415
    
    # Check Accept header for response format
    accept = request.headers.get('Accept', 'application/json')
    if 'application/json' not in accept:
        return {'error': 'Cannot produce requested format'}, 406
    
    result = model.predict(data)
    return jsonify(result), 200

How wrong Content-Type corrupts data:

# Client sends JSON with wrong header
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
body = json.dumps({'key': 'value'})

# Server parses as form, gets garbage
request.form = {'{"key": "value"}': ''}

Content-Type controls parsing:

application/json → JSON parser
application/x-www-form-urlencoded → Form parser
multipart/form-data → File upload parser
application/octet-stream → Raw bytes

Why explicit headers matter:

Wrong Content-Type silently corrupts data
Missing Accept causes client parse failures
No charset breaks Unicode characters

Every request should include:

headers = {
    'Content-Type': 'application/json; charset=utf-8',
    'Accept': 'application/json'
}

Service Versioning: Supporting Multiple Clients

Version headers enable gradual migration

User-Agent: booking-service/2.1.0 (Python/3.11; Linux)
X-API-Version: 2

Server handles multiple versions simultaneously:

api_version = request.headers.get('X-API-Version', '1')

if api_version == '1':
    # Old clients expect this format
    return {'user': user_id, 'active': True}
    
elif api_version == '2':
    # New clients get additional fields
    return {'user_id': user_id, 'is_active': True, 
            'created_at': timestamp}

Signal deprecation to old clients:

if api_version == '1':
    response.headers['Sunset'] = 'Sat, 31 Dec 2024 23:59:59 GMT'
    response.headers['Deprecation'] = 'version="1"'
    # Client knows to migrate before sunset date

Track usage to know when safe to remove:

log_api_usage(version=api_version, client=user_agent)

Version migration reality:

Week 1: Release v2, most clients still on v1
Week 4: Send migration reminders
Week 8: Add deprecation headers
Week 12: Still have clients on v1

Cannot remove v1 until all clients migrate

Some clients never update:

Forgotten batch jobs
Third-party integrations
Mobile apps users don’t update

User-Agent reveals problem clients:

mobile-app/1.0 — High error rate
batch-processor/1.5 — Still on v1
web-app/2.3 — Successfully migrated

Without version headers:

Don’t know who’s using what
Can’t deprecate safely
Breaking changes break everyone

Version headers enable controlled evolution

REST Principles

REST - Architectural Style for APIs

REST: Representational State Transfer

Architectural style, not a protocol or standard

Coined by Roy Fielding (2000 dissertation) based on HTTP design principles

Core idea: Resources identified by URLs, manipulated via standard HTTP methods

What REST is NOT:

Not a specification with compliance tests
Not a protocol like HTTP or SOAP
Not limited to JSON (can use XML, HTML, etc)
Not the only way to design APIs

What REST provides:

Set of design principles for building APIs
Conventions for mapping operations to HTTP methods
Guidelines for URL structure
Constraints that enable scalability and simplicity

REST vs other approaches:

RPC-style: /createUser, /getUser, /deleteUser (verbs in URLs)
REST-style: POST /users, GET /users/123, DELETE /users/123 (resources + methods)

REST treats everything as a resource accessible via URL

REST uses resource URLs + HTTP methods

Resource-Oriented Design - Nouns Not Verbs

REST principle: URLs identify resources (things), methods specify operations

Resource hierarchy in airline API:

User resources:

/users — Collection of all users
/users/123 — Specific user
/users/123/bookings — User’s bookings (sub-collection)
/users/123/bookings/789 — Specific booking

Flight resources:

/flights — Collection of all flights
/flights/456 — Specific flight
/flights/456/seats — Available seats

Airport resources:

/airports — Collection of airports
/airports/LAX — Specific airport
/airports/LAX/flights — Flights from LAX

URL structure conventions:

Use plural nouns: /users not /user
Use hyphens for readability: /frequent-flyers not /frequentFlyers
Nest resources to show relationships: /users/123/bookings
Keep hierarchy shallow (2-3 levels maximum)

Operations via HTTP methods:

GET /users — Get all users
GET /users/123 — Get specific user
POST /users — Create new user (body: email, password)
PUT /users/123 — Update user (body: complete resource)
DELETE /users/123 — Delete user

Nested resources show relationships:

GET /users/123/bookings returns array of user’s bookings:

[
  {"booking_id": 789, "flight_id": 456, "seat": "12A", ...},
  {"booking_id": 790, "flight_id": 457, "seat": "14B", ...}
]

GET /users/123/bookings/789 returns specific booking via user path:

{
  "booking_id": 789,
  "user_id": 123,
  "flight_id": 456,
  "seat": "12A",
  "status": "confirmed"
}

GET /bookings/789 returns same booking via direct path:

{
  "booking_id": 789,
  "user_id": 123,
  "flight_id": 456,
  "seat": "12A",
  "status": "confirmed"
}

Design choice: Provide both paths when resource makes sense independently

/users/123/bookings — User-centric view (all bookings for user)
/bookings/789 — Booking-centric view (single booking)

Different access patterns for different use cases

GET and DELETE - Read and Remove

GET retrieves resource without modification

Request targets specific resource by ID:

GET /users/456 HTTP/1.1

Server returns resource representation:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "user_id": 456,
  "email": "carol@example.com",
  "name": "Carol Chen",
  "is_active": true,
  "created_at": "2025-01-15T14:30:00Z"
}

GET characteristics:

No request body (parameters in URL or query string)
200 OK when resource exists
404 Not Found when resource doesn’t exist
Idempotent: Repeated calls return same data
Safe: No side effects on server state

GET on collections returns multiple resources:

GET /users HTTP/1.1

HTTP/1.1 200 OK

{
  "users": [
    {"user_id": 456, "email": "carol@example.com", ...},
    {"user_id": 457, "email": "bob@example.com", ...}
  ],
  "count": 2
}

DELETE removes resource

Request targets specific resource:

DELETE /users/456 HTTP/1.1

Server removes resource, returns minimal response:

HTTP/1.1 204 No Content

DELETE characteristics:

No request body
204 No Content on success (nothing to return)
404 Not Found if resource already deleted
Idempotent: Multiple deletes produce same final state

Subsequent DELETE returns 404:

First delete:

DELETE /users/456 → 204 No Content (deleted)

Second delete:

DELETE /users/456 → 404 Not Found (already gone)

Final state identical: User 456 doesn’t exist

Both methods are idempotent:

GET: Same data returned each time
DELETE: Same final state (resource absent)

Idempotency enables safe retries on network failures

POST and PUT - Create and Replace

POST creates new resource

Request sent to collection URL:

POST /users HTTP/1.1
Content-Type: application/json

{
  "email": "carol@example.com",
  "password": "hashed_pwd",
  "name": "Carol Chen"
}

Server assigns ID and creates resource:

HTTP/1.1 201 Created
Location: /users/456
Content-Type: application/json

{
  "user_id": 456,
  "email": "carol@example.com",
  "name": "Carol Chen",
  "created_at": "2025-01-15T14:30:00Z"
}

POST characteristics:

POSTs to collection (/users not /users/456)
Server assigns resource ID
201 Created status indicates success
Location header contains new resource URL
Not idempotent: Repeated POSTs create multiple resources

Why not idempotent:

POST /users {"email": "test@example.com"} → 201 Created, user_id=456

POST /users {"email": "test@example.com"} → 201 Created, user_id=789 (different resource!)

PUT replaces entire resource

Request sent to specific resource URL:

PUT /users/456 HTTP/1.1
Content-Type: application/json

{
  "email": "carol.new@example.com",
  "name": "Carol Chen",
  "is_active": false
}

Server replaces resource completely:

HTTP/1.1 200 OK

{
  "user_id": 456,
  "email": "carol.new@example.com",
  "name": "Carol Chen",
  "is_active": false,
  "updated_at": "2025-01-20T10:00:00Z"
}

PUT characteristics:

PUTs to specific resource (/users/456)
Client specifies resource ID
Request body contains complete resource
200 OK with updated resource
Idempotent: Multiple identical PUTs result in same state

PUT replaces entirely:

Missing fields in request are removed:

PUT /users/456 {"email": "new@example.com"}

Result: name field removed (entire resource replaced, not email alone)

Use PATCH for partial updates instead

Query Parameters - Filtering Collections

Query parameters modify which resources are returned

Example: GET /flights?departure_airport=LAX

Path /flights identifies collection, departure_airport=LAX filters results

Query parameter syntax:

Appended after ? in URL
Key-value pairs: key=value
Multiple parameters joined with &
URL-encoded: Space → %20, special characters escaped

Filtering examples:

Single filter: GET /flights?departure_airport=LAX → Returns only flights departing from LAX

Multiple filters: GET /flights?departure_airport=LAX&arrival_airport=JFK&date=2025-02-15 → Returns LAX→JFK flights on specific date

Three-way filter: GET /flights?departure_airport=LAX&status=scheduled&aircraft_type=737 → Returns scheduled 737 flights from LAX

All filters are AND conditions - flight must match all criteria

Parameter validation returns 400 Bad Request:

Invalid value:

GET /flights?date=Feb-15-2025

HTTP/1.1 400 Bad Request

{
  "error": "Invalid date format",
  "parameter": "date",
  "expected_format": "YYYY-MM-DD"
}

Server validates parameters before database query

Sorting with parameters:

GET /flights?sort=departure_time — Ascending order (default)
GET /flights?sort=-departure_time — Descending order (minus prefix)
GET /flights?sort=departure_airport,departure_time — Multiple fields (comma-separated)

Last example sorts LAX flights before JFK, then by time within each airport

Combining filters and sorting:

GET /flights?departure_airport=LAX&status=scheduled&sort=-departure_time

Returns scheduled LAX flights, most recent first

Query parameters keep URL structure clean while enabling flexible filtering

Pagination - Handling Large Collections

Problem: Collection with 2,500 flights too large for single response

Without pagination: GET /flights → Returns 2,500 flights, 4MB response, 8 second load time

With pagination: GET /flights?limit=50&offset=0 → Returns 50 flights, 80KB response, 150ms load time

Offset-based pagination:

limit controls page size, offset controls starting position

First page (flights 0-49): GET /flights?limit=50&offset=0
Second page (flights 50-99): GET /flights?limit=50&offset=50
Third page (flights 100-149): GET /flights?limit=50&offset=100

Formula: offset = page_number × limit

Pagination metadata in response:

{
  "flights": [...50 flight objects...],
  "pagination": {
    "limit": 50,
    "offset": 0,
    "total": 2500,
    "next": "/flights?limit=50&offset=50",
    "prev": null
  }
}

Response includes links to next/previous pages

Alternative pagination strategies:

Cursor-based (for frequently updated data):

GET /flights?limit=50&after=flight_xyz Next page: GET /flights?limit=50&after=flight_abc

Cursor identifies position in result set (not numeric offset)

Advantages over offset:

Handles new insertions correctly
Prevents duplicate/skipped results when data changes
More efficient for large offsets (no skip operation)

Disadvantage: Cannot jump to arbitrary page

Page-based (simpler API):

GET /flights?page=1&per_page=50 and GET /flights?page=2&per_page=50

Server calculates offset internally: offset = (page - 1) × per_page

Pagination with filters:

GET /flights?departure_airport=LAX&limit=50&offset=0 → First 50 LAX flights GET /flights?departure_airport=LAX&limit=50&offset=50 → Next 50 LAX flights

Filters applied before pagination

Measured performance (2,500 flight collection):

Full response: 8s, 4MB
Paginated (50/page): 150ms, 80KB per page
53× faster initial load

Idempotency - Safe Retry Behavior

Idempotent operation: Multiple identical requests have same effect as single request

GET - Idempotent and safe:

# Call once
response1 = requests.get('http://api/users/123')
user1 = response1.json()  # {"user_id": 123, "email": "alice@..."}

# Call again
response2 = requests.get('http://api/users/123')
user2 = response2.json()  # {"user_id": 123, "email": "alice@..."}

# Same result, no side effects
assert user1 == user2

PUT - Idempotent but not safe:

# Call once
requests.put('http://api/users/123',
             json={"email": "alice.new@example.com", "is_active": true})
# Result: email changed to alice.new@example.com

# Call again with same data
requests.put('http://api/users/123',
             json={"email": "alice.new@example.com", "is_active": true})
# Result: email still alice.new@example.com (no additional change)

# Multiple calls → same final state

DELETE - Idempotent:

# Call once
response1 = requests.delete('http://api/users/123')
# Response: 204 No Content, user deleted

# Call again
response2 = requests.delete('http://api/users/123')
# Response: 404 Not Found, user already deleted

# System in same state (user doesn't exist)

POST - Not idempotent:

# Call once
response1 = requests.post('http://api/users',
                          json={"email": "bob@example.com"})
# Response: 201 Created, user_id=456

# Call again with same data
response2 = requests.post('http://api/users',
                          json={"email": "bob@example.com"})
# Response: 201 Created, user_id=789 (different user!)

# Two users created - NOT idempotent

Idempotency matters for retries:

Network timeout scenario:

try:
    response = requests.post('http://api/bookings',
                            json={...},
                            timeout=5)
except requests.Timeout:
    # Did booking succeed or fail? Unknown!
    # Retry risks duplicate booking
    pass

Idempotency key pattern:

# Client generates unique request ID
idempotency_key = str(uuid.uuid4())

response = requests.post('http://api/bookings',
                        json={...},
                        headers={'Idempotency-Key': idempotency_key})

# If timeout, retry with same key
# Server sees duplicate key, returns original response
# Safe to retry POST operations

Server implementation:

if idempotency_key in cache:
    return cache[idempotency_key]  # Return cached response
else:
    result = create_booking(...)
    cache[idempotency_key] = result
    return result

Idempotency enables safe retry logic

Statelessness - No Server-Side Session

REST constraint: Each request contains all information needed to process it

Stateful approach (violates REST):

# Login creates server-side session
POST /login
Body: {"email": "alice@example.com", "password": "..."}

Response:
HTTP/1.1 200 OK
Set-Cookie: session_id=abc123

# Server stores:
sessions['abc123'] = {
  'user_id': 123,
  'email': 'alice@example.com',
  'logged_in_at': '2025-01-15T10:00:00Z'
}

# Subsequent requests reference session
GET /bookings
Cookie: session_id=abc123

# Server looks up session['abc123'] to get user_id

Problems with server-side sessions:

Server must store session for every active user
10K active users = 10K session objects in memory
Load balancer must route all requests from user to same server
Server restart loses all sessions
Horizontal scaling requires session replication

Stateless approach (REST-compliant):

# Login returns token (JWT)
POST /login
Body: {"email": "alice@example.com", "password": "..."}

Response:
HTTP/1.1 200 OK

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
}

# Token contains: {user_id: 123, email: "alice@...", exp: ...}
# Signed by server, cannot be forged

Stateless request:

# Every request includes complete authentication
GET /bookings
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...

# Server decodes token to get user_id
# No session lookup needed

JWT (JSON Web Token) structure:

Header:
{
  "alg": "HS256",
  "typ": "JWT"
}

Payload:
{
  "user_id": 123,
  "email": "alice@example.com",
  "exp": 1705324800,  # Expiration timestamp
  "iat": 1705321200   # Issued at timestamp
}

Signature:
HMACSHA256(
  base64(header) + "." + base64(payload),
  server_secret_key
)

Final token:
base64(header).base64(payload).signature

Benefits of stateless design:

Server doesn’t store session data (no memory overhead)
Any server can handle any request (no sticky sessions)
Horizontal scaling trivial (add servers)
Server restart doesn’t invalidate tokens
Token validation: O(1) decode vs O(1) session lookup

Token expiration:

Short-lived: 15 minutes (security)
Refresh token: 30 days (obtain new access token)
Client responsibility to manage token lifecycle

Statelessness enables unlimited horizontal scaling

REST with Python (Flask)

Framework Choice: Flask

Python web frameworks for APIs:

Flask: Minimal, explicit routing
FastAPI: Modern, type-safe
Django REST Framework: Full-featured

EE 547 uses Flask

Minimal abstractions make core concepts visible. Patterns transfer to FastAPI and Django REST.

Framework-agnostic concepts covered:

Request/response flow
URL routing and parameters
Request data extraction
Production deployment

Web Framework Architecture

Framework sits between HTTP server and handler code

Client
  ↓ HTTP Request
HTTP Server (gunicorn)
  ↓ WSGI
Flask Framework
  ↓ Calls
Handler Function
  ↓ Returns
Flask Framework
  ↓ WSGI
HTTP Server
  ↓ HTTP Response
Client

What Flask does:

Match URL to function
Parse incoming data
Call handler function
Build HTTP response

Handler implementation:

Functions that process requests
Business logic
Return data

Simple Route Example

Connecting a URL to a function

from flask import Flask

app = Flask(__name__)

@app.route('/health')
def health_check():
    return {'status': 'healthy'}

What happens:

@app.route('/health') registers the route
Client sends: GET /health
Flask sees /health matches registered route
Flask calls health_check() function
Function returns dict
Flask converts to JSON response

Response:

HTTP/1.1 200 OK
Content-Type: application/json

{"status": "healthy"}

Flask automatically:

Sets status code to 200
Sets Content-Type header
Converts dict to JSON

HTTP Methods in Routes

Restricting which HTTP methods a route accepts

@app.route('/models', methods=['GET'])
def list_models():
    return {'models': [...]}

@app.route('/models', methods=['POST'])
def create_model():
    return {'id': 123}, 201

Same URL, different methods:

GET /models → calls list_models()
POST /models → calls create_model()
PUT /models → 405 Method Not Allowed

Why separate by method:

GET: Read data (list models)
POST: Create data (new model)
Different operations, different functions
Clear separation of concerns

Default is GET only:

@app.route('/health')  # Only accepts GET
def health():
    return {'status': 'ok'}

URL Parameters

Capturing values from the URL

@app.route('/models/<model_id>')
def get_model(model_id):
    return {'id': model_id}

URL: GET /models/42 Result: model_id = "42" (string)

Type conversion:

@app.route('/models/<int:model_id>')
def get_model(model_id):
    return {'id': model_id}

URL: GET /models/42 Result: model_id = 42 (integer)

URL: GET /models/abc Result: 404 Not Found (can’t convert to int)

Multiple parameters:

@app.route('/models/<int:model_id>/predictions/<pred_id>')
def get_prediction(model_id, pred_id):
    return {'model': model_id, 'prediction': pred_id}

URL: GET /models/42/predictions/xyz Result: model_id = 42, pred_id = "xyz"

Accessing Request Data: JSON Body

Reading JSON from request body

from flask import request

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    # data is dict: {'features': [1, 2, 3]}

    features = data['features']
    result = model.predict(features)

    return {'prediction': float(result)}

Client sends:

POST /predict
Content-Type: application/json

{"features": [1.2, 3.4, 5.6]}

Flask automatically:

Checks Content-Type header
Parses JSON string
Creates Python dict
Makes available as request.json

Safe access with get():

data = request.json
threshold = data.get('threshold', 0.5)  # Default if missing

Accessing Request Data: Query Parameters

Reading parameters from URL query string

@app.route('/models')
def list_models():
    # GET /models?limit=10&status=trained

    limit = request.args.get('limit', 100, type=int)
    # limit = 10 (converted to int)

    status = request.args.get('status')
    # status = "trained"

    models = fetch_models(limit=limit, status=status)
    return {'models': models}

Query string after ? in URL:

Key-value pairs: key=value
Multiple params: & separator
/models?limit=10&status=trained

request.args.get() parameters:

First arg: parameter name
Second arg: default value if missing
type=int: convert to integer

Without default:

status = request.args.get('status')  # None if missing

With default:

limit = request.args.get('limit', 100, type=int)  # 100 if missing

Accessing Request Data: Headers

Reading HTTP headers

@app.route('/predict', methods=['POST'])
def predict():
    # Authorization header
    auth = request.headers.get('Authorization')
    # "Bearer eyJhbGci..."

    # Custom headers
    request_id = request.headers.get('X-Request-ID')

    # Content type
    content_type = request.headers.get('Content-Type')

    # Validate token
    if not auth:
        return {'error': 'Missing authorization'}, 401

    if not validate_token(auth):
        return {'error': 'Invalid token'}, 401

    # Process request
    return {'prediction': 0.87}

Common headers:

Authorization: Auth tokens
Content-Type: Body format
X-Request-ID: Request tracking
User-Agent: Client information

Headers case-insensitive:

request.headers.get('Content-Type')
request.headers.get('content-type')  # Same

Building Responses: Simple Return

Return dict → Flask converts to JSON

@app.route('/predict', methods=['POST'])
def predict():
    result = model.predict(request.json['features'])
    return {'prediction': float(result)}

Response Flask generates:

HTTP/1.1 200 OK
Content-Type: application/json

{"prediction": 0.87}

Flask automatically:

Sets status code to 200 (success)
Sets Content-Type to application/json
Converts Python dict to JSON string
Returns properly formatted HTTP response

This is the most common pattern:

Simple and clean
Works for most GET/PUT requests
Default 200 status appropriate for success

Building Responses: Custom Status Code

Return tuple: (data, status_code)

@app.route('/models', methods=['POST'])
def create_model():
    model_id = save_model(request.json)
    return {'id': model_id}, 201

Response:

HTTP/1.1 201 Created
Content-Type: application/json

{"id": 42}

When to use different status codes:

201 Created - Resource successfully created (POST)

return {'id': new_id}, 201

204 No Content - Success but no data to return (DELETE)

return '', 204

404 Not Found - Resource doesn’t exist

return {'error': 'Model not found'}, 404

422 Unprocessable Entity - Validation failed

return {'error': 'Invalid input'}, 422

Building Responses: Adding Headers

Return tuple: (data, status, headers)

@app.route('/models', methods=['POST'])
def create_model():
    model_id = save_model(request.json)

    return {'id': model_id}, 201, {
        'Location': f'/models/{model_id}',
        'X-Request-ID': request.headers.get('X-Request-ID')
    }

Response:

HTTP/1.1 201 Created
Content-Type: application/json
Location: /models/42
X-Request-ID: abc-123

{"id": 42}

Common response headers:

Location - URL of newly created resource

'Location': f'/models/{model_id}'

X-Request-ID - Echo back for tracking

'X-Request-ID': request.headers.get('X-Request-ID')

Cache-Control - Control caching

'Cache-Control': 'max-age=300'

Production: Why Not flask run

Development server not for production

flask run

Problems:

Single process, single thread
Handles one request at a time
No crash recovery
Debug mode exposes code
Poor performance under load

Example:

@app.route('/predict')
def predict():
    time.sleep(2)  # Prediction takes 2 seconds
    return {'result': 0.87}

With flask run:

Request 1: 0-2 seconds
Request 2: 2-4 seconds (waits)
Request 3: 4-6 seconds (waits)
All sequential, no concurrency

Production needs:

Multiple worker processes
Concurrent request handling
Automatic crash recovery
Process management

Single process means:

One request blocks others
No parallelism
Poor resource usage
Unacceptable for production

Production: Gunicorn with Workers

Gunicorn - Production WSGI server

pip install gunicorn
gunicorn --workers 4 --bind 0.0.0.0:5000 app:app

What this does:

Starts 4 separate worker processes
Each worker handles one request at a time
4 concurrent requests possible
Load balanced across workers

Worker calculation:

workers = (CPU cores × 2) + 1

2-core machine → 5 workers 4-core machine → 9 workers

Same 2-second prediction with 4 workers:

Requests 1-4: All start at 0s, finish at 2s
Request 5: Starts at 2s when worker frees

4× improvement for concurrent requests

Configuration file:

# gunicorn.conf.py
workers = 4
bind = "0.0.0.0:5000"
timeout = 30

gunicorn -c gunicorn.conf.py app:app

Multiple workers = concurrent processing

Each worker is independent process

Static Files: Production Strategies

Problem: Flask serves files synchronously - blocks workers

GET /static/model_weights.pkl  # 500MB file

What happens:

Flask worker reads 500MB file
Sends to client over network
Worker blocked 30+ seconds
4 concurrent downloads = all workers blocked

Solution 1: Nginx serves static files

Nginx handles /static/* directly
Flask never sees these requests
Workers free for API calls

Solution 2: S3 redirect pattern

@app.route('/models/<model_id>/download')
def download_model(model_id):
    # Generate temporary S3 URL (expires in 1 hour)
    s3_url = generate_presigned_url(
        bucket='models',
        key=f'{model_id}.pkl',
        expires_in=3600
    )
    return redirect(s3_url)

Flow:

Client requests file from Flask
Flask returns 302 redirect to S3
Client downloads directly from S3
Flask worker free in <1ms

Use S3 redirect for: Large files (>10MB), model weights, datasets, user uploads

API Specification

OpenAPI Specification

OpenAPI defines API structure in machine-readable format

Specification written in YAML or JSON, describes:

Available endpoints and operations
Request parameters and body schemas
Response formats and status codes
Authentication requirements
Data types and constraints

Example specification for user endpoint:

openapi: 3.0.0
info:
  title: User Service API
  version: 2.1.0
paths:
  /users/{userId}:
    get:
      parameters:
        - name: userId
          in: path
          required: true
          schema:
            type: integer
            minimum: 1
      responses:
        '200':
          description: User found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/User'
        '404':
          description: User not found

components:
  schemas:
    User:
      type: object
      required: [user_id, email, is_active]
      properties:
        user_id: {type: integer}
        email: {type: string, format: email}
        is_active: {type: boolean}
        engagement_score: {type: number, minimum: 0, maximum: 100}

Specification enforces contract between API provider and consumers

Specification serves multiple purposes:

1. Documentation source - Swagger UI generates interactive docs - Always synchronized with implementation - Developers explore API without writing code

2. Validation layer - Request validation against schema - Response validation before sending - Type checking and constraint enforcement

3. Code generation - Server stubs with routing - Client SDKs in multiple languages - Type-safe API calls

4. Contract testing - Verify implementation matches spec - Detect breaking changes - Test compliance automatically

Specification-first development:

Write spec → Generate code → Implement handlers

Ensures API design considered before implementation details

Alternative: Code-first

Write code → Generate spec from annotations

Easier to start, harder to maintain consistency

Schema Validation

OpenAPI schemas define data structures with constraints

ML prediction endpoint schema:

paths:
  /models/{modelId}/predict:
    post:
      parameters:
        - name: modelId
          in: path
          schema: {type: string, pattern: '^[a-z0-9-]+$'}
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [features, model_version]
              properties:
                features:
                  type: array
                  items: {type: number}
                  minItems: 10
                  maxItems: 10
                model_version:
                  type: string
                  enum: [v1.0, v1.1, v2.0]
                threshold:
                  type: number
                  minimum: 0.0
                  maximum: 1.0
                  default: 0.5
      responses:
        '200':
          content:
            application/json:
              schema:
                type: object
                required: [prediction, confidence]
                properties:
                  prediction: {type: number}
                  confidence: {type: number, minimum: 0, maximum: 1}

Schema constraints validated automatically:

Type checking: string vs number vs array
Array length: exactly 10 features required
Value ranges: threshold between 0 and 1
Enum validation: model_version must match list
Required fields: features and model_version mandatory

Invalid requests rejected before processing:

Missing required field:

POST /models/classifier-v2/predict
{"features": [1.2, 3.4, 5.6, 7.8, 9.0, 1.1, 2.2, 3.3, 4.4, 5.5]}

Response: 400 Bad Request
{
  "error": "Validation failed",
  "details": [{
    "field": "model_version",
    "message": "Required property missing"
  }]
}

Wrong array length:

POST /models/classifier-v2/predict
{"features": [1, 2, 3], "model_version": "v2.0"}

Response: 400 Bad Request
{
  "details": [{
    "field": "features",
    "message": "Array must contain 10 items, found 3"
  }]
}

Invalid enum value:

{"features": [...], "model_version": "v3.0"}

Response: 400 Bad Request
{
  "details": [{
    "field": "model_version",
    "message": "Value must be one of: v1.0, v1.1, v2.0"
  }]
}

Validation prevents:

Type errors in application code
Database constraint violations
Model inference crashes
Invalid computation results

Request rejected at API boundary: 1-2ms

Request failing during processing: 50-500ms wasted

Specification-Driven Design

Single OpenAPI specification generates multiple artifacts

1. Interactive documentation (Swagger UI)

Browsable interface with:

List of all endpoints grouped by resource
Request/response examples
Try-it-out functionality for live testing
Schema definitions with types and constraints
Authentication requirements

Developers test endpoints without writing client code

2. Server stubs

Generated code includes:

Route definitions matching specification
Request parsing and validation
Response serialization
Type hints (in typed languages)
Handler function signatures

# Generated from OpenAPI spec
@app.route('/models/<model_id>/predict', methods=['POST'])
def predict_model(model_id: str):
    # Request already validated against schema
    body = request.json  # Type: PredictionRequest

    # Implement business logic here
    result = run_prediction(model_id, body['features'])

    # Response validated before sending
    return {'prediction': result, 'confidence': 0.87}

3. Client SDKs

Type-safe client libraries:

# Generated Python client
from api_client import UserServiceClient

client = UserServiceClient(base_url='https://api.example.com')

# Method signatures from spec
user = client.get_user(user_id=123)  # Type: User
print(user.email)  # IDE autocomplete knows fields

# Type checker catches errors
client.get_user(user_id="abc")  # Error: expected int

4. Request validation middleware

Automatically generated validators:

# Validates before handler executes
def validate_request(spec):
    def decorator(f):
        def wrapper(*args, **kwargs):
            # Check request matches spec
            errors = validate_against_schema(
                request,
                spec['paths'][request.path]
            )
            if errors:
                return {'error': errors}, 400
            return f(*args, **kwargs)
        return wrapper
    return decorator

5. Mock servers

Generate mock API from specification:

Returns example responses
Validates request format
Enables frontend development before backend complete

Code generation tools:

OpenAPI Generator: 40+ language targets
Swagger Codegen: Server and client generation
Prism: Mock server from specification
Redoc: Alternative documentation renderer

Specification as single source of truth:

Change spec → Regenerate all artifacts

Documentation, validation, and client code stay synchronized

Manual maintenance alternative:

Write documentation separately (becomes outdated)
Write validation logic per endpoint (inconsistent)
Manually create client libraries (error-prone)
Update all three when API changes (forgotten)

Machine-readable specification prevents divergence

API Versioning

APIs evolve but clients update slowly

Version placement options:

URL path versioning (most common):

GET /v1/users/123
GET /v2/users/123

Advantages: - Version immediately visible in URL - Easy to route in load balancer - Clear in logs and monitoring

Disadvantages: - URL changes with version - Resource “same” user has different URLs

Header versioning:

GET /users/123
Accept: application/vnd.api.v1+json

GET /users/123
Accept: application/vnd.api.v2+json

Advantages: - URLs remain stable - Content negotiation pattern

Disadvantages: - Version not visible in URL - Harder to test in browser - Requires header inspection

Custom header:

GET /users/123
API-Version: 1

GET /users/123
API-Version: 2

Similar trade-offs to Accept header

Query parameter (not recommended):

GET /users/123?version=1
GET /users/123?version=2

Disadvantages: - Mixes version with filtering parameters - Caching issues (query params affect cache key)

Version granularity:

Major versions (breaking changes): - v1 → v2: Field removed or renamed - v2 → v3: Response structure changed - Requires separate implementation

Minor versions (additions): - v2.0 → v2.1: New optional field added - v2.1 → v2.2: New endpoint added - Backward compatible within major version

Semantic versioning pattern:

MAJOR.MINOR.PATCH

MAJOR: Breaking changes (v1 → v2)
MINOR: New features, backward compatible (v2.0 → v2.1)
PATCH: Bug fixes (v2.1.0 → v2.1.1)

When to increment major version:

Removing endpoint
Removing required field from response
Adding required field to request
Changing field type
Renaming field
Changing authentication method

Backward compatible additions:

New optional request field
New field in response (clients ignore unknown)
New endpoint
New optional query parameter
New HTTP status code for new error case

Parallel version support:

Both versions active simultaneously:

@app.route('/v1/users/<id>')
def get_user_v1(id):
    user = fetch_user(id)
    return {'user': id, 'active': user.is_active}

@app.route('/v2/users/<id>')
def get_user_v2(id):
    user = fetch_user(id)
    return {
        'user_id': id,  # Renamed
        'is_active': user.is_active,
        'created_at': user.created_at  # New field
    }

Maintains compatibility while evolving API

Breaking vs Compatible Changes

Breaking change: Modification that causes existing clients to fail

Common breaking changes:

Field removal:

// v1 response
{"user_id": 123, "email": "alice@example.com", "phone": "+1-555-0100"}

// v2 response
{"user_id": 123, "email": "alice@example.com"}
// phone field removed

Client code accessing response['phone'] raises KeyError

Field rename:

// v1: {"created": "2024-01-15"}
// v2: {"created_at": "2024-01-15"}

Client parsing created field receives KeyError

Type change:

// v1: {"count": "42"}    (string)
// v2: {"count": 42}       (number)

Client expecting string, performs string operations on number → TypeError

New required field:

// v1 request
POST /bookings
{"flight_id": 456, "user_id": 123}

// v2 request (requires seat_class)
POST /bookings
{"flight_id": 456, "user_id": 123, "seat_class": "economy"}

Old clients missing seat_class → 400 Bad Request

Status code change:

// v1: Returns 200 OK when user not found (empty result)
// v2: Returns 404 Not Found when user not found

Client checking status == 200 for success misses 404 case

Non-breaking changes (backward compatible):

Adding optional field to response:

// v1 response
{"user_id": 123, "email": "alice@example.com"}

// v2 response
{"user_id": 123, "email": "alice@example.com",
 "created_at": "2024-01-15"}

Old clients ignore unknown created_at field

Adding optional request parameter:

// v1: GET /flights?departure=LAX
// v2: GET /flights?departure=LAX&max_price=500

Old clients don’t send max_price, server uses default behavior

Adding new endpoint:

// v1: GET /users, POST /users
// v2: GET /users, POST /users, GET /users/search

Old clients unaware of /users/search, continue using existing endpoints

Adding new HTTP method to existing endpoint:

// v1: GET /users/123
// v2: GET /users/123, PATCH /users/123

Old clients only use GET, PATCH addition doesn’t affect them

Deprecation headers indicate future removal:

HTTP/1.1 200 OK
Deprecation: true
Sunset: Wed, 31 Dec 2025 23:59:59 GMT
Link: </v2/users/123>; rel="successor-version"

Clients warned field or endpoint will be removed

Contract testing prevents breaking changes:

def test_v1_user_response_format():
    """Verify v1 response format unchanged"""
    response = api_client.get_user_v1(123)

    assert 'user_id' in response
    assert 'email' in response
    assert isinstance(response['user_id'], int)
    assert isinstance(response['email'], str)

Test fails if response structure changes, preventing accidental breaking changes

Version Migration Timeline

Migrating clients from v1 to v2 takes months

Typical timeline:

Week 0: v2 deployed, v1 maintained

Both versions handle requests: - v1: Existing clients continue working - v2: New clients adopt new features - Server runs both implementations

Week 4: Monitor adoption

SELECT version, COUNT(*) as requests
FROM api_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY version;

-- v1: 234,567 requests (65%)
-- v2: 126,433 requests (35%)

Week 8: Begin deprecation warnings

Add headers to v1 responses:

HTTP/1.1 200 OK
Deprecation: true
Sunset: Mon, 15 Sep 2025 23:59:59 GMT
Link: </docs/v2-migration>; rel="deprecation-policy"

Week 12: Active migration outreach

Contact clients still on v1: - Email with migration guide - Breaking change documentation - Code examples for common patterns - Offer support for migration issues

Week 16: Check adoption progress

-- v1: 89,234 requests (25%)
-- v2: 267,766 requests (75%)

Still 25% on v1, cannot remove yet

Week 20: Gradual enforcement

Make v1 read-only: - GET requests: Continue working - POST/PUT/DELETE: Return 410 Gone with migration instructions

Week 24: Final adoption check

-- v1: 12,453 requests (3.5%)
-- v2: 344,547 requests (96.5%)

Identify remaining v1 clients:

SELECT client_id, COUNT(*) as requests
FROM api_logs
WHERE version = 'v1'
  AND timestamp > NOW() - INTERVAL '7 days'
GROUP BY client_id
ORDER BY requests DESC;

-- batch-job-1: 8,234 requests (automated, no owner)
-- mobile-app: 2,109 requests (old app version)
-- partner-api: 1,876 requests (quarterly release cycle)
-- unknown: 234 requests (API key lost)

Week 26-28: Final client migration

Contact remaining clients directly

Week 30: v1 shutdown

Return 410 Gone for all v1 requests:

HTTP/1.1 410 Gone

{
  "error": "API v1 has been retired",
  "shutdown_date": "2025-09-15",
  "migration_guide": "/docs/v1-to-v2",
  "support": "api-support@example.com"
}

Cost of parallel versions:

Duplicate code maintenance
Testing both implementations
Security patches for both
Support team handles both
Monitoring two versions

Estimated 1.8× development cost during overlap period

Error Response Structure

Structured errors provide actionable information

Basic error response:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Request validation failed"
  }
}

Detailed validation errors:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Request validation failed",
    "details": [
      {
        "field": "features[3]",
        "value": "NaN",
        "constraint": "type",
        "message": "Must be a number"
      },
      {
        "field": "threshold",
        "value": 1.5,
        "constraint": "maximum",
        "message": "Must be at most 1.0"
      }
    ]
  }
}

Rate limit error with retry information:

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "API rate limit exceeded",
    "limit": 1000,
    "remaining": 0,
    "reset_at": "2025-01-15T15:00:00Z",
    "retry_after": 600
  }
}

Resource not found with suggestions:

{
  "error": {
    "code": "RESOURCE_NOT_FOUND",
    "message": "Model 'classifier-xyz' not found",
    "resource_type": "model",
    "resource_id": "classifier-xyz",
    "suggestions": [
      "Use GET /models to list available models",
      "Check model_id spelling"
    ]
  }
}

Error response components:

1. Machine-readable code

Enables programmatic handling:

if response.status_code == 400:
    error = response.json()['error']

    if error['code'] == 'VALIDATION_ERROR':
        # Fix validation issues
        for detail in error['details']:
            log.warning(f"Field {detail['field']}: {detail['message']}")

    elif error['code'] == 'RATE_LIMIT_EXCEEDED':
        # Wait and retry
        time.sleep(error['retry_after'])

2. Human-readable message

For developer debugging and logs

3. Context-specific details

Field-level errors for validation failures

4. Actionable information

Rate limits include reset time and retry delay

5. Request correlation

{
  "error": {...},
  "request_id": "7f3c6b2a-5d9e-4f8b",
  "timestamp": "2025-01-15T14:35:22Z"
}

Include in support tickets for log correlation

6. Documentation links

{
  "error": {...},
  "documentation": "https://api.example.com/docs/errors/validation"
}

Error code categories:

VALIDATION_ERROR: Client sent invalid data
AUTHENTICATION_ERROR: Token missing or invalid
AUTHORIZATION_ERROR: Valid token, insufficient permissions
RATE_LIMIT_EXCEEDED: Too many requests
RESOURCE_NOT_FOUND: Requested resource doesn’t exist
CONFLICT: Operation conflicts with current state
SERVER_ERROR: Internal server failure

Consistent error structure across all endpoints

Pagination: Offset-Based

Large collections require pagination

Collection with 2,500 users:

Without pagination: GET /users - Returns all 2,500 users - Response size: 3.8 MB - Load time: 6-8 seconds - Client memory: Entire collection

With pagination: GET /users?limit=50&offset=0 - Returns 50 users - Response size: 76 KB (50× smaller) - Load time: 120ms (50× faster) - Client memory: Current page only

Offset-based pagination parameters:

limit: Number of items per page (page size) offset: Number of items to skip (starting position)

Fetching pages:

Page 1 (users 1-50):

GET /users?limit=50&offset=0

Page 2 (users 51-100):

GET /users?limit=50&offset=50

Page 3 (users 101-150):

GET /users?limit=50&offset=100

Formula: offset = (page_number - 1) × limit

Pagination metadata in response:

{
  "users": [
    {"user_id": 1, "email": "alice@example.com", ...},
    {"user_id": 2, "email": "bob@example.com", ...},
    ...
  ],
  "pagination": {
    "limit": 50,
    "offset": 0,
    "total": 2500,
    "has_more": true,
    "next": "/users?limit=50&offset=50",
    "previous": null
  }
}

Response includes links to next/previous pages

Offset pagination with filters:

GET /users?status=active&limit=50&offset=0

Filter applied before pagination: 1. Query users where status=‘active’ (1,200 matching) 2. Skip first 0 users 3. Return next 50 users

{
  "users": [...50 active users...],
  "pagination": {
    "limit": 50,
    "offset": 0,
    "total": 1200,  // Total matching filter
    "next": "/users?status=active&limit=50&offset=50"
  }
}

Offset pagination advantages:

Simple to implement
Easy to understand
Can jump to arbitrary page
Total count available

Offset pagination limitations:

1. Performance degrades with large offsets

Database query: SELECT * FROM users LIMIT 50 OFFSET 10000

Must scan 10,000 rows before returning 50

Page 1 (offset=0): 15ms
Page 100 (offset=5000): 340ms
Page 1000 (offset=50000): 4200ms

2. Inconsistent results during modifications

Client requests page 1 (users 1-50) User 25 gets deleted Client requests page 2 (offset=50)

Receives users 51-100 (previously users 52-101) User 51 never seen by client

3. Duplicate results with insertions

Client requests page 1 (users 1-50) New user inserted at position 10 Client requests page 2 (offset=50)

Receives users 51-100 (previously users 50-99) User 50 appears on both pages

Cursor-based pagination solves these issues

Pagination: Cursor-Based

Cursor encodes position in result set

Instead of numeric offset, use opaque cursor token

Initial request:

GET /users?limit=50

Response with cursor:

{
  "users": [
    {"user_id": 1, ...},
    {"user_id": 2, ...},
    ...
    {"user_id": 50, ...}
  ],
  "pagination": {
    "limit": 50,
    "next_cursor": "eyJ1c2VyX2lkIjo1MH0=",
    "has_more": true
  }
}

Next page request:

GET /users?limit=50&cursor=eyJ1c2VyX2lkIjo1MH0=

Cursor is base64-encoded JSON: {"user_id": 50}

Database query using cursor:

-- Without cursor (first page)
SELECT * FROM users ORDER BY user_id LIMIT 50;

-- With cursor (subsequent pages)
SELECT * FROM users
WHERE user_id > 50
ORDER BY user_id
LIMIT 50;

No OFFSET clause - uses indexed WHERE condition

Cursor for different sort orders:

Sort by created_at descending:

{
  "cursor": "eyJjcmVhdGVkX2F0IjoiMjAyNS0wMS0xNVQxMDozMDowMFoiLCJ1c2VyX2lkIjo1MH0="
}

Decoded: {"created_at": "2025-01-15T10:30:00Z", "user_id": 50}

Include user_id for tie-breaking when timestamps equal

Cursor pagination advantages:

1. Consistent performance

Direct index lookup, no scanning:

Page 1: 15ms
Page 100: 15ms
Page 1000: 15ms (constant time)

2. Stable results during modifications

Client requests page 1 with cursor User 25 gets deleted Client requests page 2 using cursor

Cursor points to user_id > 50, deletion of user 25 doesn’t affect next page

3. No duplicate results from insertions

Cursor maintains position relative to sorted order, new insertions don’t cause duplicates

Cursor pagination limitations:

Cannot jump to arbitrary page

No “go to page 50” - must traverse sequentially

Cannot display total page count

Computing total requires full count query (expensive)

Cursor must be opaque to client

// Bad: Exposing internal structure
GET /users?after_id=50

// Good: Opaque cursor
GET /users?cursor=eyJ1c2VyX2lkIjo1MH0=

Allows server to change cursor format without breaking clients

When to use each approach:

Offset pagination: - Need page numbers (UI with page selector) - Need total count - Data rarely changes - Small to medium collections

Cursor pagination: - Large collections (millions of rows) - Data frequently updated - Mobile apps (efficient, consistent) - Infinite scroll UX

Many APIs support both: limit/offset for random access, limit/cursor for efficient traversal

Authentication

Authentication Breach: LinkedIn 2012

June 2012: 6.5 million LinkedIn password hashes stolen

What LinkedIn did:

-- Stored password hashes (not plaintext) ✓
SELECT user_id, email, password_hash FROM users;

-- But used SHA-1 without salt ✗
password_hash = SHA1(password)

What attackers did:

Built password dictionary (10 million common passwords)
Computed hashes once:

common_hashes = {
    SHA1("password123"): "password123",
    SHA1("123456"): "123456",
    # ... 10 million entries
}

Searched stolen database:

for hash in stolen_hashes:
    if hash in common_hashes:
        compromised.append(common_hashes[hash])

Result: 90% of passwords cracked within 72 hours

Why it failed:

SHA-1 designed for speed: GPU computes 10 billion hashes/second
No salt: Same password → same hash
One computation → thousands of accounts compromised

Same password “123456” compromised 753,000 accounts simultaneously

LinkedIn’s failure shows why hashing alone isn’t enough.

Identity in Distributed Systems

LinkedIn’s breach shows authentication failures cascade across systems

ML API requires authentication to prevent unauthorized access:

DELETE /models/production/v2
# Who sent this? Can they delete production models?

Every API request needs to answer two questions:

Who is making this request? (Authentication)
Can they perform this action? (Authorization)

In a single process, identity is implicit:

def delete_file(filepath):
    # Running as OS user 'alice'
    # OS checks if alice can delete filepath
    os.remove(filepath)

In distributed systems, identity must be explicit:

def handle_delete_request(request):
    # Who sent this HTTP request?
    user = authenticate(request)  # Extract identity
    
    # Can they delete this file?
    if not authorize(user, 'delete', filepath):
        return 403
    
    delete_file(filepath)

HTTP is stateless - no memory between requests:

No persistent connection to maintain identity
Each request independent
Must prove identity every time

Three approaches to maintaining identity across requests:

Include credentials every request (HTTP Basic Auth)
Create server-side session (Cookie-based)
Issue cryptographic proof (Token-based)

Each approach makes different trade-offs between security, scalability, and complexity.

Identity must be explicitly established in every HTTP request.

Password Authentication: Converting Secrets to Identity

Authentication transforms a secret into verified identity

Step 1: User provides credentials

POST /login
{"email": "alice@example.com", "password": "secret123"}

Step 2: Server verifies against stored credentials

def authenticate(email, password):
    user = db.query("SELECT * FROM users WHERE email = ?", email)
    if verify_password(password, user.password_hash):
        return user.id  # Identity established
    return None

Step 3: Server issues proof of authentication

# Option A: Server-side session
session_id = generate_random_id()
sessions[session_id] = user_id
return {"session_id": session_id}

# Option B: Cryptographic token
token = jwt.encode({"user_id": user_id, "exp": time() + 3600})
return {"token": token}

Password storage determines breach impact:

Never store plaintext passwords:

-- CATASTROPHIC: Database breach exposes all passwords
SELECT * FROM users WHERE password = 'secret123'

Store cryptographic hashes instead:

-- Safe: Cannot reverse hash to get password
SELECT * FROM users WHERE email = 'alice@example.com'
-- Then verify: hash(provided_password) == stored_hash

Authentication converts credentials to identity proof.

Hash Functions: Time as a Defense Mechanism

LinkedIn used SHA-1 hashing - why wasn’t that enough?

First, understand why plaintext is catastrophic:

Database breach with plaintext passwords:

SELECT email, password FROM users LIMIT 3;
-- alice@example.com | secret123
-- bob@example.com   | secret123  
-- carol@example.com | password1

All accounts immediately compromised.

Hash functions provide one-way transformation:

hash("secret123") → "5994471abb01112afcc18159f6cc74b4f511b99806da59b3caf5a9c173cacfc5"

Cannot reverse: hash → original password (computationally infeasible)

Asymmetry favors attackers:

Legitimate use: Verify one password for one user

# Single hash computation: < 1ms
if hash(provided_password) == stored_hash:
    authenticate_user()

Attack: Try millions of passwords against all users

# Attacker with stolen hash database
common_passwords = ["password", "123456", "secret123", ...]
for password in common_passwords:  # 10 million
    test_hash = hash(password)
    for stored_hash in database:  # 100,000 users
        if test_hash == stored_hash:
            compromised.append(...)

Solution: Make hashing deliberately slow

SHA-1 (LinkedIn’s mistake): Designed for speed → 10 billion/second on GPU
bcrypt: Designed for passwords → 10/second on GPU
Time difference: 1 billion× slower

This is why LinkedIn’s passwords fell in 72 hours - SHA-1 allowed rapid dictionary attacks.

This asymmetry favors defenders over attackers.

Time cost makes brute force attacks impractical.

Salt: Preventing Parallel Attacks

LinkedIn’s second mistake: No salt

Even with slow hashing, common passwords create identical hashes:

Without salt, all users with “password123” have same hash:

hash("password123") → "ef92b778bafe771e89245b89ecb..."
# Database search finds 1,847 users with this hash
# All compromised with single hash computation

Salt: Random value unique to each user

def create_user(email, password):
    salt = generate_random_bytes(16)  # Unique per user
    password_hash = hash(salt + password)
    db.store(email, salt, password_hash)

Now identical passwords produce different hashes:

# User 1
salt1 = "a1b2c3d4..."
hash("a1b2c3d4..." + "password123") → "7f3c6b2a..."

# User 2  
salt2 = "e5f6g7h8..."
hash("e5f6g7h8..." + "password123") → "92a8b7c6..."

# Different hashes despite same password

Impact on attack strategy:

Without salt: One computation compromises all instances

target_hash = "ef92b778..."
if computed_hash == target_hash:
    # Found password for ALL users with this hash

With salt: Must attack each user individually

for user in users:
    for password in dictionary:
        if hash(user.salt + password) == user.hash:
            # Found password for ONE user only

Salt is not secret - stored with hash, prevents mass attacks not targeted ones

With salt, LinkedIn’s 753,000 “123456” users would each need individual attacks

Salt forces individual attacks per user.

Work Factors: Adaptive Security

Combining defenses: Slow hashing + Salt + Adaptive work factor

bcrypt’s configurable work factor scales with hardware improvements:

# Work factor determines iteration count: 2^factor
bcrypt.gensalt(10)  # 2^10 = 1,024 iterations (2010)
bcrypt.gensalt(12)  # 2^12 = 4,096 iterations (2020)  
bcrypt.gensalt(14)  # 2^14 = 16,384 iterations (2030)

Each increment doubles computation time:

Factor	Iterations	Time/Hash	Passwords/Day
10	1,024	50ms	1.7M
11	2,048	100ms	864K
12	4,096	200ms	432K
13	8,192	400ms	216K
14	16,384	800ms	108K

Balancing security and usability:

def choose_work_factor():
    # Target: 250ms computation time
    test_password = b"benchmark"
    
    for factor in range(10, 15):
        start = time.time()
        bcrypt.hashpw(test_password, bcrypt.gensalt(factor))
        duration = time.time() - start
        
        if duration > 0.250:  # 250ms target
            return factor
    
    return 14  # Maximum reasonable factor

Moore’s Law compensation:

Computing power doubles every 2 years
Increase work factor by 1 every 2 years
Maintains constant security margin

Security parameter improves over time without code changes

Work factor increases maintain security despite hardware improvements.

Sessions vs Tokens: State Management Trade-offs

Server sessions: Centralized state

# Login creates session in shared store
session_id = generate_uuid()
redis.set(f"session:{session_id}", {
    "user_id": 123,
    "created": timestamp,
    "permissions": ["read", "write"]
})
response.set_cookie("session_id", session_id)

# Every request requires lookup
def handle_request(request):
    session_id = request.cookies.get("session_id")
    session = redis.get(f"session:{session_id}")  # Network call
    if not session:
        return 401

Tokens: Distributed state

# Login creates self-contained token
payload = {
    "user_id": 123,
    "exp": timestamp + 3600,
    "permissions": ["read", "write"]
}
token = jwt.encode(payload, SECRET_KEY)
return {"token": token}

# Every request validates locally
def handle_request(request):
    token = request.headers["Authorization"].split(" ")[1]
    payload = jwt.decode(token, SECRET_KEY)  # CPU only
    # No network call required

Trade-offs in practice:

Aspect	Sessions	Tokens
Revocation	Immediate	At expiration
Scaling	Requires shared store	Linear
Network calls	Every request	None
State size	Server: O(users)	Server: O(1)
Client complexity	Simple cookie	Header management

Sessions require coordination; tokens are independent.

Authorization: From Identity to Permissions

Authentication establishes identity; authorization determines capabilities

def process_request(request):
    # Step 1: Who are you? (Authentication)
    user_id = validate_token(request.headers['Authorization'])
    if not user_id:
        return 401  # Unauthorized - don't know who you are
    
    # Step 2: What can you do? (Authorization)
    resource = request.path  # e.g., /models/123
    action = request.method   # e.g., DELETE
    
    if not has_permission(user_id, resource, action):
        return 403  # Forbidden - know who you are, can't do this
    
    # Step 3: Execute
    return perform_action(resource, action)

Three authorization models:

1. Role-Based (RBAC): Users have roles, roles have permissions

user.roles = ["developer", "viewer"]
role_permissions = {
    "developer": ["read", "write", "deploy"],
    "viewer": ["read"],
    "admin": ["read", "write", "deploy", "delete"]
}
# Can user deploy? Check if any role has permission

2. Attribute-Based (ABAC): Decisions based on attributes

can_access = (
    user.department == resource.department and
    user.clearance_level >= resource.sensitivity and
    current_time in user.work_hours
)

3. Resource-Based: Users own resources

if resource.owner_id == user_id:
    return FULL_ACCESS
elif user_id in resource.shared_with:
    return READ_ONLY
else:
    return NO_ACCESS

Authorization determines what authenticated users can do.

Token Expiration and Revocation Trade-offs

Tokens can’t be recalled after issuing:

Once issued, JWT remains valid until expiration:

token_payload = {
    "user_id": 123,
    "exp": time() + 3600,  # Valid for 1 hour
    "scopes": ["read:data", "write:data", "delete:data"]
}

Employee terminated at 2:00 PM:

Token issued: 1:30 PM
Token expires: 2:30 PM
Problem: 30 minutes of unauthorized access

Three approaches to bounded revocation:

1. Short-lived access tokens (15 minutes)

access_token = create_token(expires_in=15*60)
refresh_token = create_token(expires_in=30*24*60*60)

# After termination, refresh fails
def refresh():
    if user_terminated(refresh_token.user_id):
        return 401  # No new access token
    return create_access_token()

2. Blacklist critical tokens

# Maintain revoked token list (small subset)
revoked_tokens = redis.set()  # Only for terminated users

def validate_token(token):
    if token.jti in revoked_tokens:  # Quick check
        return None
    return decode_token(token)

3. Version-based invalidation

# User has token_version in database
user.token_version = 2  # Increment on revocation

# Token includes version
token.version = 1

# Validation checks version
if token.version < user.token_version:
    return 401  # Token outdated

Trade-off: Security (short expiry) vs Performance (fewer refreshes)

Shorter tokens increase security but require more refreshes.

Stateless Scaling: The Operational Advantage

Session-based scaling requires coordination

Adding servers with sessions:

# Server 1 has session for User A
# Server 2 has session for User B
# Load balancer must remember routing (sticky sessions)
# OR share sessions via Redis (single point of failure)

Measured impact with 1000 requests/second:

Sticky sessions: Imbalanced load (Server 1: 89%, Server 2: 11%)
Redis sessions: 2ms added latency per request
Redis failure: All users logged out

Token-based scaling is trivial

Adding servers with tokens:

# Any server can validate any token
# No coordination required
# No shared state

Measured impact with 1000 requests/second:

Round-robin load balancing: Even distribution (25% each for 4 servers)
Token validation: 0.1ms CPU time
Server failure: Requests rerouted, no user impact

Deployment advantages:

Operation	Sessions	Tokens
Add server	Update session store	Add server
Remove server	Migrate sessions	Remove server
Deploy update	Coordinate session drain	Rolling update
Region failover	Replicate sessions	No change

Cost at scale (10K concurrent users):

Sessions: Redis cluster ($200/month) + Complexity
Tokens: No infrastructure + 0.1% CPU overhead

Stateless architecture enables linear scaling without coordination overhead.

Tokens enable horizontal scaling without coordination.

JWT and OAuth 2.0

JWT: Self-Contained Identity Tokens

JSON Web Tokens encode identity without server state

JWT structure: Three Base64-encoded parts separated by dots

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.
eyJ1c2VyX2lkIjoxMjMsImVtYWlsIjoiYWxpY2VAZXhhbXBsZS5jb20iLCJleHAiOjE3MDUzMjQ4MDB9.
SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Part 1: Header (Algorithm and type)

{
  "alg": "HS256",  // HMAC SHA-256
  "typ": "JWT"     // Token type
}

Part 2: Payload (Claims about user)

{
  "user_id": 123,
  "email": "alice@example.com",
  "exp": 1705324800,  // Expires: Unix timestamp
  "iat": 1705321200,  // Issued at
  "scopes": ["read", "write"]
}

Part 3: Signature (Prevents tampering)

HMACSHA256(
  base64(header) + "." + base64(payload),
  server_secret_key
)

Critical properties:

Self-contained: All data in token, no database lookup
Tamper-proof: Invalid signature = rejected token
Stateless: Server only needs secret key
Not encrypted: Payload readable by anyone (Base64 decode)

JWT structure enables stateless authentication.

JWT Validation: Cryptographic Trust

Signature prevents token forgery

Server creates token with secret:

import jwt

secret_key = "server-secret-abc123"  # Only server knows

payload = {
    "user_id": 123,
    "email": "alice@example.com",
    "exp": time.time() + 3600
}

token = jwt.encode(payload, secret_key, algorithm="HS256")
# Result: eyJhbGciOiJIUzI1NiIs...

Client cannot modify token:

# Attacker tries to change user_id
decoded = base64.decode(token.split('.')[1])
decoded['user_id'] = 999  # Change to admin
fake_payload = base64.encode(decoded)

# But cannot generate valid signature without secret
fake_token = header + "." + fake_payload + "." + random_signature
# Server will reject: Invalid signature

Server validates with same secret:

def validate_token(token):
    try:
        payload = jwt.decode(token, secret_key, algorithms=["HS256"])
        # Signature valid, token not expired
        return payload
    except jwt.InvalidSignatureError:
        return None  # Tampered token
    except jwt.ExpiredSignatureError:
        return None  # Token too old

Symmetric (HS256) vs Asymmetric (RS256):

HS256: Same secret for signing and verifying (simple, fast)
RS256: Private key signs, public key verifies (allows third-party validation)

Only server with secret can create valid tokens.

JWT Claims: Standard Fields and Custom Data

Standard claims provide common functionality

Registered claims (predefined meanings):

{
  "iss": "https://auth.company.com",  // Issuer
  "sub": "user:123",                   // Subject (user)
  "aud": "https://api.company.com",    // Audience (recipient)
  "exp": 1705324800,                   // Expiration time
  "nbf": 1705321200,                   // Not before
  "iat": 1705321200,                   // Issued at
  "jti": "a1b2c3d4"                    // JWT ID (unique)
}

Time-based validation:

current_time = 1705323000  # Unix timestamp

# Token not yet valid (nbf = not before)
if current_time < token['nbf']:
    return "Token not yet valid"

# Token expired
if current_time > token['exp']:
    return "Token expired"

# Valid time window: nbf <= current_time <= exp

Custom claims for application data:

{
  // Standard claims
  "exp": 1705324800,
  "iat": 1705321200,
  
  // Custom application claims
  "user_id": 123,
  "email": "alice@example.com",
  "roles": ["developer", "reviewer"],
  "department": "engineering",
  "permissions": {
    "models": ["read", "write"],
    "data": ["read"]
  }
}

Token size considerations:

Each claim adds bytes to every request
HTTP header limit: ~8KB
Typical JWT: 200-500 bytes
With permissions: 500-2000 bytes

Claims carry both metadata and application data.

Refresh Tokens: Balancing Security and Usability

Short access tokens + long refresh tokens minimize risk

Dual token pattern:

def login(email, password):
    if authenticate(email, password):
        # Short-lived for API calls
        access_token = create_jwt(
            user_id=123,
            expires_in=15*60  # 15 minutes
        )
        
        # Long-lived for obtaining new access tokens
        refresh_token = create_jwt(
            user_id=123,
            token_type="refresh",
            expires_in=30*24*60*60  # 30 days
        )
        
        # Store refresh token for revocation
        db.store_refresh_token(refresh_token)
        
        return {
            "access_token": access_token,
            "refresh_token": refresh_token,
            "expires_in": 900
        }

Token refresh flow:

def refresh_access_token(refresh_token):
    # Validate refresh token
    payload = jwt.decode(refresh_token, secret_key)
    
    # Check if revoked (requires DB check)
    if is_revoked(refresh_token):
        return 401  # Revoked
    
    # Issue new access token
    new_access = create_jwt(
        user_id=payload['user_id'],
        expires_in=15*60
    )
    
    return {"access_token": new_access}

Security boundaries:

Compromise window: Maximum 15 minutes (access token lifetime)
Refresh check: Database lookup only on refresh (every 15 min)
Immediate revocation: Possible via refresh token blacklist
Performance: 1 DB check per 15 minutes vs every request

Refresh tokens enable short access tokens without constant re-authentication.

OAuth 2.0: Delegated Authorization

OAuth allows third-party access without sharing passwords

OAuth solves password sharing with third parties:

# WITHOUT OAuth (dangerous):
# GitHub analyzer needs your Google Drive files
analyzer.login(
    google_email="alice@gmail.com",
    google_password="secret123"  # Giving password to third party!
)

OAuth authorization flow:

Step 1: User authorizes at provider

Browser → https://accounts.google.com/oauth/authorize?
    client_id=github-analyzer&
    redirect_uri=https://analyzer.com/callback&
    scope=drive.readonly&
    response_type=code

Step 2: Provider redirects with authorization code

Browser ← https://analyzer.com/callback?code=abc123

Step 3: Exchange code for token (backend)

# Server-to-server, not visible to browser
response = requests.post('https://oauth2.googleapis.com/token', {
    'code': 'abc123',
    'client_id': 'github-analyzer',
    'client_secret': 'secret-key-xyz',  # Proves identity
    'grant_type': 'authorization_code'
})

tokens = response.json()
# {
#   "access_token": "ya29.a0ARrdaM...",
#   "token_type": "Bearer",
#   "expires_in": 3600,
#   "scope": "drive.readonly"
# }

Key principles:

User never gives password to third party
Provider controls exactly what access is granted
Access can be revoked without changing password

OAuth enables access without sharing credentials.

OAuth Scopes: Granular Permissions

Scopes limit what applications can access

Requesting specific permissions:

# Application requests only what it needs
auth_url = "https://github.com/login/oauth/authorize?" + urlencode({
    "client_id": "ml-trainer-app",
    "scope": "repo:read user:email",  # Specific permissions
    "redirect_uri": "https://mlapp.com/callback"
})

User sees requested permissions:

ML Trainer App wants to access your GitHub account:

✓ Read access to repositories
  - View code, issues, pull requests
  - View repository metadata
  
✓ Read user email addresses
  - View primary email
  - View verified status

✗ Will NOT be able to:
  - Write to repositories
  - Delete anything
  - Access billing information
  
[Authorize] [Deny]

Token contains granted scopes:

{
  "access_token": "gho_16C7e42F292c6912E7710c838347Ae178B4a",
  "token_type": "bearer",
  "scope": "repo:read user:email",  // What was actually granted
  "expires_in": 28800
}

Common scope patterns:

Provider	Scope	Permission
GitHub	`repo`	Full repository access
GitHub	`repo:status`	Only commit status
Google	`drive.readonly`	Read files only
Google	`drive.file`	Only files created by app
Slack	`chat:write`	Post messages
Slack	`users:read`	View user information

Principle of least privilege: Request minimum necessary scope

Scopes provide granular access control.

OAuth Grant Types: Different Flows for Different Needs

OAuth defines multiple flows for different scenarios

1. Authorization Code (web apps with backend)

# Most secure: Code exchanged server-to-server
# Frontend never sees client_secret
flow = "user → provider → code → backend → token"

2. Client Credentials (service-to-service)

# No user involved, service authenticates directly
response = requests.post('https://oauth2.provider.com/token', {
    'grant_type': 'client_credentials',
    'client_id': 'batch-processor',
    'client_secret': 'secret-xyz',
    'scope': 'data.process'
})
# Used for: Cron jobs, backend services, APIs calling APIs

3. Implicit Flow (deprecated, was for SPAs)

// Token returned directly in URL fragment
// Insecure: Token visible in browser history
// Replaced by: Authorization Code + PKCE

4. Password Grant (deprecated, legacy systems)

# User gives password to application directly
# Defeats purpose of OAuth
# Only use: Migrating legacy systems

Modern standard: Authorization Code + PKCE

# PKCE (Proof Key for Code Exchange) adds security
code_verifier = generate_random_string(128)
code_challenge = sha256(code_verifier)

# Include challenge in authorization request
# Include verifier in token exchange
# Prevents code interception attacks

Grant type selection:

User-facing web app → Authorization Code
Backend service → Client Credentials
Mobile/SPA → Authorization Code + PKCE
Never → Password or Implicit

Different flows optimize for security and usability.

Security Considerations: Token Storage and Transmission

Where and how to store tokens determines security

Browser storage options:

// localStorage - Persistent but vulnerable to XSS
localStorage.setItem('token', jwt_token);
// ⚠️ Any JavaScript can read: <script>alert(localStorage.token)</script>

// sessionStorage - Per-tab, still XSS vulnerable  
sessionStorage.setItem('token', jwt_token);

// httpOnly cookie - Not accessible to JavaScript
// ✓ XSS protected, ✗ CSRF vulnerable
Set-Cookie: token=jwt_token; HttpOnly; Secure; SameSite=Strict

// Memory only - Most secure but lost on refresh
const token = jwt_token;  // JavaScript variable

Mobile app storage:

# iOS Keychain / Android Keystore (encrypted)
keychain.set('access_token', token, accessible=WHEN_UNLOCKED)

# SharedPreferences / UserDefaults (not encrypted)
# ⚠️ Accessible if device rooted/jailbroken

Token transmission:

# Always use Authorization header
headers = {'Authorization': f'Bearer {token}'}

# Never in URL parameters (logged, cached, shared)
# ✗ GET /api/data?token=jwt_token  # Appears in logs!

# Never in request body for GET (non-standard)
# ✗ GET /api/data {"token": "jwt_token"}

Security checklist:

✓ HTTPS only (never HTTP)
✓ Short expiration times
✓ Refresh token rotation
✓ Validate on every request
✓ Log anomalies (geographic changes)
✗ Don’t log token values
✗ Don’t store in git

Choose storage based on threat model.

Authorization Patterns

Authorization Models: From Simple to Sophisticated

Evolution of authorization complexity

Level 1: Binary access (all or nothing)

if authenticated:
    return FULL_ACCESS
else:
    return NO_ACCESS
# Problem: Every authenticated user can do everything

Level 2: Resource ownership

def can_access(user_id, resource):
    if resource.owner_id == user_id:
        return True
    return False
# Problem: No sharing, no admin access

Level 3: Role-based (RBAC)

user_roles = ["developer"]
role_permissions = {
    "viewer": ["read"],
    "developer": ["read", "write"],
    "admin": ["read", "write", "delete"]
}
# Problem: Roles are coarse-grained

Level 4: Attribute-based (ABAC)

def can_access(user, resource, action, context):
    return (
        user.department == resource.department and
        action in user.permissions and
        resource.sensitivity <= user.clearance_level and
        context.time in user.work_hours and
        context.location in user.allowed_locations
    )
# Fine-grained but complex

Real systems use hybrid approaches:

Ownership for user-created resources
Roles for system-wide permissions
Attributes for special cases

More complex models provide finer control.

Resource Ownership: The Foundation Pattern

Users control resources they create

Database schema enforces ownership:

CREATE TABLE models (
    id INTEGER PRIMARY KEY,
    owner_id INTEGER NOT NULL,
    name VARCHAR(255),
    created_at TIMESTAMP,
    is_public BOOLEAN DEFAULT FALSE,
    FOREIGN KEY (owner_id) REFERENCES users(id)
);

CREATE TABLE model_shares (
    model_id INTEGER,
    user_id INTEGER,
    permission VARCHAR(20), -- 'read', 'write'
    PRIMARY KEY (model_id, user_id)
);

Authorization logic:

def get_permission(user_id, model_id):
    model = db.query("SELECT * FROM models WHERE id = ?", model_id)
    
    # Owner has full control
    if model.owner_id == user_id:
        return ["read", "write", "delete", "share"]
    
    # Check explicit shares
    share = db.query("""
        SELECT permission FROM model_shares 
        WHERE model_id = ? AND user_id = ?
    """, model_id, user_id)
    
    if share:
        return share.permission.split(",")
    
    # Public resources allow read
    if model.is_public:
        return ["read"]
    
    return []  # No access

Common patterns:

Private by default: New resources only accessible to owner
Explicit sharing: Owner grants specific permissions to specific users
Public option: Owner can make resource world-readable
Transfer ownership: Special operation with audit trail

Ownership provides natural access boundaries.

Role-Based Access Control (RBAC)

Users have roles, roles have permissions

Three-level hierarchy:

# 1. Users are assigned roles
user_roles = {
    123: ["developer", "reviewer"],
    456: ["viewer"],
    789: ["admin", "developer"]
}

# 2. Roles define permissions
role_permissions = {
    "viewer": {
        "models": ["read"],
        "data": ["read"]
    },
    "developer": {
        "models": ["read", "write", "execute"],
        "data": ["read", "write"],
        "compute": ["submit"]
    },
    "reviewer": {
        "models": ["read", "approve"],
        "audit": ["read"]
    },
    "admin": {
        "models": ["read", "write", "delete"],
        "data": ["read", "write", "delete"],
        "compute": ["submit", "cancel"],
        "users": ["read", "write"]
    }
}

# 3. Check if any role grants permission
def has_permission(user_id, resource_type, action):
    user_role_list = user_roles.get(user_id, [])
    
    for role in user_role_list:
        permissions = role_permissions.get(role, {})
        allowed_actions = permissions.get(resource_type, [])
        if action in allowed_actions:
            return True
    
    return False

RBAC advantages:

Simple to understand and audit
Easy to onboard users (assign role)
Consistent permissions across resources
Well-supported by frameworks

RBAC limitations:

Roles proliferate over time
Exceptions require new roles
No context awareness

RBAC separates users from permissions via roles.

Attribute-Based Access Control (ABAC)

Access decisions based on multiple attributes

Attributes from multiple sources:

# User attributes
user = {
    "id": 123,
    "department": "ml_research",
    "clearance_level": 3,
    "location": "us-west",
    "employee_type": "full_time",
    "projects": ["alpha", "beta"]
}

# Resource attributes
resource = {
    "id": 456,
    "type": "model",
    "classification": "confidential",
    "department": "ml_research",
    "project": "alpha",
    "created_date": "2024-01-15",
    "tags": ["experimental", "gpu_required"]
}

# Environment attributes
context = {
    "time": "14:30",
    "day": "weekday",
    "ip_address": "10.0.1.5",
    "network": "corporate",
    "request_type": "api"
}

# Action being requested
action = "write"

Policy evaluation:

def evaluate_access(user, resource, action, context):
    # Rule 1: Department match required
    if user["department"] != resource["department"]:
        return False
    
    # Rule 2: Clearance level check
    clearance_required = {
        "public": 1,
        "internal": 2,
        "confidential": 3,
        "secret": 4
    }
    if user["clearance_level"] < clearance_required[resource["classification"]]:
        return False
    
    # Rule 3: Time-based access
    if resource["classification"] == "secret":
        hour = int(context["time"].split(":")[0])
        if not (9 <= hour <= 17) or context["day"] == "weekend":
            return False
    
    # Rule 4: Project membership for write
    if action == "write":
        if resource["project"] not in user["projects"]:
            return False
    
    return True

ABAC evaluates multiple attributes for decisions.

Hierarchical Resources and Inheritance

Permissions flow down resource hierarchies

Resource hierarchy:

Organization
├── Projects
│   ├── Models
│   │   ├── Versions
│   │   └── Deployments
│   └── Datasets
│       ├── Training
│       └── Validation
└── Teams
    └── Members

Permission inheritance:

class ResourceHierarchy:
    def __init__(self):
        self.permissions = {}  # resource_id -> {user_id: permissions}
        self.parents = {}       # resource_id -> parent_id
    
    def get_effective_permissions(self, user_id, resource_id):
        """Get permissions including inherited from parents"""
        permissions = set()
        
        # Walk up the hierarchy
        current = resource_id
        while current:
            # Get direct permissions at this level
            if current in self.permissions:
                user_perms = self.permissions[current].get(user_id, [])
                permissions.update(user_perms)
            
            # Move to parent
            current = self.parents.get(current)
        
        return list(permissions)
    
    def check_permission(self, user_id, resource_id, action):
        perms = self.get_effective_permissions(user_id, resource_id)
        
        # Check for explicit deny (overrides allow)
        if f"deny:{action}" in perms:
            return False
        
        # Check for allow
        return action in perms or "*" in perms

Example scenario:

# User has 'read' on project
# Automatically has 'read' on all models in project
# Unless explicitly denied on specific model

# Project level: user_123 -> ["read", "write"]
# Model level: user_123 -> ["deny:write"]  # Override
# Result: Can read but not write this specific model

Permissions cascade down hierarchy unless overridden.

Delegation and Impersonation

Allowing users to act on behalf of others

Delegation: Temporary permission transfer

class DelegationManager:
    def create_delegation(self, from_user, to_user, resource, 
                         permissions, expires_at):
        """User explicitly grants subset of their permissions"""
        
        # Verify delegator has permissions to delegate
        delegator_perms = get_permissions(from_user, resource)
        if not all(p in delegator_perms for p in permissions):
            raise Error("Cannot delegate permissions you don't have")
        
        delegation = {
            "id": generate_id(),
            "from_user": from_user,
            "to_user": to_user,
            "resource": resource,
            "permissions": permissions,
            "expires_at": expires_at,
            "created_at": now()
        }
        
        db.store_delegation(delegation)
        audit_log("DELEGATION_CREATED", delegation)
        
    def check_permission(self, user_id, resource, action):
        # Check direct permissions
        if has_direct_permission(user_id, resource, action):
            return True
        
        # Check delegated permissions
        delegations = db.query("""
            SELECT * FROM delegations
            WHERE to_user = ? AND resource = ?
            AND expires_at > NOW()
        """, user_id, resource)
        
        for delegation in delegations:
            if action in delegation.permissions:
                audit_log("DELEGATED_ACCESS", {
                    "user": user_id,
                    "delegator": delegation.from_user,
                    "action": action
                })
                return True
        
        return False

Service impersonation:

# Service account acts as user for background tasks
def run_as_user(user_id, task):
    """Service executes task with user's permissions"""
    
    # Verify service account is authorized
    if not is_service_account(current_identity):
        raise Error("Only service accounts can impersonate")
    
    # Create impersonation context
    with impersonate(user_id) as context:
        # All authorization checks use user_id's permissions
        # But audit logs show both identities
        result = execute_task(task, context)
    
    return result

Delegation and impersonation enable flexible access.

Policy as Code: Declarative Authorization

Express authorization rules as policies, not procedures

Traditional procedural approach:

def can_access(user, resource, action):
    if user.role == "admin":
        return True
    if resource.owner == user.id:
        return True
    if user.department == resource.department:
        if action == "read":
            return True
        if action == "write" and user.seniority > 2:
            return True
    # Complex nested logic continues...
    return False

Policy as code (declarative):

# policies.yaml
policies:
  - id: admin-full-access
    description: "Admins can do anything"
    effect: allow
    subjects: ["role:admin"]
    actions: ["*"]
    resources: ["*"]
    
  - id: owner-full-access
    description: "Owners control their resources"
    effect: allow
    subjects: ["user:*"]
    actions: ["*"]
    resources: ["*"]
    condition: "resource.owner == subject.id"
    
  - id: department-read-access
    description: "Same department can read"
    effect: allow
    subjects: ["user:*"]
    actions: ["read"]
    resources: ["*"]
    condition: "resource.department == subject.department"
    
  - id: senior-write-access
    description: "Senior staff can write in department"
    effect: allow
    subjects: ["user:*"]
    actions: ["write"]
    resources: ["*"]
    condition: |
      resource.department == subject.department AND
      subject.seniority > 2

Policy evaluation engine:

class PolicyEngine:
    def evaluate(self, subject, action, resource, context):
        applicable_policies = self.find_matching_policies(
            subject, action, resource
        )
        
        # Explicit deny overrides allow
        for policy in applicable_policies:
            if policy.effect == "deny":
                return False, f"Denied by {policy.id}"
        
        # Any allow grants access
        for policy in applicable_policies:
            if policy.effect == "allow":
                return True, f"Allowed by {policy.id}"
        
        # Default deny
        return False, "No matching allow policy"

Declarative policies separate logic from implementation.

Cloud Provider Authorization Models

AWS IAM: Policy-based with principals and resources

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::123456:user/alice"},
    "Action": ["s3:GetObject", "s3:PutObject"],
    "Resource": "arn:aws:s3:::ml-models/*",
    "Condition": {
      "IpAddress": {"aws:SourceIp": "10.0.0.0/8"},
      "DateGreaterThan": {"aws:CurrentTime": "2024-01-01"}
    }
  }]
}

GCP IAM: Role bindings at resource level

# Binding roles to identities on resources
bindings:
  - role: roles/storage.objectViewer
    members:
      - user:alice@example.com
      - serviceAccount:ml-trainer@project.iam
    resource: projects/my-project/buckets/ml-models
  
  - role: roles/ml.modelUser
    members:
      - group:ml-team@example.com
    condition:
      expression: request.time > timestamp("2024-01-01")

Azure RBAC: Scope-based assignments

# Role assignment at different scopes
assignment = {
    "roleDefinitionId": "/subscriptions/sub123/providers/Microsoft.Authorization/roleDefinitions/contributor",
    "principalId": "user-guid-123",
    "scope": "/subscriptions/sub123/resourceGroups/ml-resources"
}
# Permissions inherit down: Subscription → Resource Group → Resource

Common patterns across providers:

Principle of least privilege - Start with no access
Deny overrides allow - Explicit deny wins
Inheritance down hierarchy - Higher level grants flow down
Conditions/constraints - Time, IP, attributes

Cloud providers use variations of policy-based access.

REST Alternatives: GraphQL

The Multiple Client Problem

Different clients need different data from same resources

REST endpoint returns fixed structure:

GET /api/users/123

{
  "user_id": 123,
  "email": "alice@example.com",
  "name": "Alice Chen",
  "profile_image": "base64...[2MB]",
  "preferences": {...50 fields...},
  "activity_history": [...200 entries...],
  "connected_devices": [...},
  "subscription": {...},
  "recommendations": [...}
}

Each client uses different subset:

Mobile app needs:

name, profile_image (thumbnail)
Downloads 3MB, uses 50KB

Admin dashboard needs:

email, subscription, activity_history
Downloads 3MB, uses 500KB

Analytics service needs:

user_id, preferences.language
Downloads 3MB, uses 1KB

REST over-fetches:

95% of transferred data unused
Mobile battery drain
Network bandwidth cost
Server CPU for serialization

REST solutions are inadequate:

Sparse fieldsets: /users/123?fields=name,email (non-standard)
Multiple endpoints: /users/123/mobile, /users/123/admin (proliferation)
API versioning: v1, v2, v3… (maintenance burden)

Different clients require different subsets of data.

GraphQL: Query Language for APIs

GraphQL lets clients specify exactly what data they need

Instead of multiple REST calls:

# REST: 3 requests, 3 round trips
user = GET('/users/123')
posts = GET('/users/123/posts?limit=5')
for post in posts:
    comments = GET(f'/posts/{post.id}/comments?limit=2')

Single GraphQL query:

query GetUserWithPosts {
  user(id: 123) {
    name
    email
    posts(limit: 5) {
      title
      createdAt
      comments(limit: 2) {
        text
        author {
          name
        }
      }
    }
  }
}

Response matches query structure exactly:

{
  "data": {
    "user": {
      "name": "Alice Chen",
      "email": "alice@example.com",
      "posts": [
        {
          "title": "GraphQL Benefits",
          "createdAt": "2024-01-15",
          "comments": [
            {
              "text": "Great post!",
              "author": {"name": "Bob"}
            }
          ]
        }
      ]
    }
  }
}

Key differences from REST:

Single endpoint: POST /graphql for everything
Client controls response shape
Nested data in one request
No versioning needed (fields are added/deprecated)

GraphQL fetches related data in single request.

GraphQL Type System

Everything in GraphQL has a type

Schema definition:

type User {
  id: ID!                    # ! means non-null
  name: String!
  email: String!
  posts: [Post!]!            # Array of Posts (never null)
  friendCount: Int
  accountType: AccountType!  # Enum type
}

type Post {
  id: ID!
  title: String!
  content: String
  author: User!              # Relationship to User
  comments: [Comment!]!
  likes: Int!
}

enum AccountType {
  FREE
  PREMIUM
  ENTERPRISE
}

type Query {
  user(id: ID!): User        # Can return null if not found
  users(limit: Int = 10): [User!]!
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  deleteUser(id: ID!): Boolean!
}

Type system provides:

Contract enforcement: Server must return correct types
Query validation: Invalid queries rejected before execution
Auto-documentation: Tools can introspect schema
Code generation: Type-safe clients in any language

Query validation example:

# INVALID: 'invalid_field' doesn't exist
{ user(id: 123) { invalid_field } }
# Error: Field 'invalid_field' not found on type 'User'

Type system provides safety and tooling.

Query vs Mutation: Read vs Write Separation

GraphQL separates reads from writes explicitly

Query: Read operations (no side effects)

query GetUserData {
  user(id: 123) {
    name
    email
    posts {
      title
      publishedAt
    }
  }
}

Can be cached
Executed in parallel
Safe to retry
No state changes

Mutation: Write operations (changes state)

mutation CreatePost {
  createPost(input: {
    title: "GraphQL Benefits"
    content: "..."
    authorId: 123
  }) {
    id           # Return created post
    title
    publishedAt
    author {
      name
    }
  }
}

Never cached
Executed serially (in order)
Not safe to retry
Returns modified data

Serial execution prevents race conditions:

mutation TransferFunds {
  # These execute in order, not parallel
  withdraw(account: "A", amount: 100) { balance }
  deposit(account: "B", amount: 100) { balance }
}

Convention: Mutations return the modified object so client can update its cache without refetching.

Queries parallelize; mutations serialize.

The N+1 Query Problem

GraphQL’s flexibility creates performance challenges

Query requests users and their posts:

query GetUsersWithPosts {
  users(limit: 100) {
    name
    posts {
      title
      content
    }
  }
}

Naive resolver implementation:

def resolve_users(limit):
    # 1 query
    return db.query("SELECT * FROM users LIMIT ?", limit)

def resolve_posts(user):
    # Called for EACH user (N queries)
    return db.query("SELECT * FROM posts WHERE user_id = ?", user.id)

# Total: 1 + 100 = 101 database queries!

Problem scales with nesting:

users(100) → posts → comments → author
# 1 + 100 + 500 + 1500 = 2101 queries

Solution: DataLoader pattern (batching)

# Collects all user IDs, makes single query
post_loader = DataLoader(batch_load_posts)

def batch_load_posts(user_ids):
    # Single query for all users
    posts = db.query(
        "SELECT * FROM posts WHERE user_id IN (?)",
        user_ids
    )
    # Group by user_id and return in order
    return group_by_user(posts)

# Now: 1 + 1 = 2 queries total

Measured impact:

Without DataLoader: 2.3 seconds (101 queries)
With DataLoader: 45ms (2 queries)
51× faster

DataLoader batches queries to prevent N+1 problem.

Query Complexity and DOS Attacks

GraphQL’s flexibility enables malicious queries

Innocent-looking query with exponential cost:

query MaliciousQuery {
  users(limit: 100) {
    posts {
      comments {
        author {
          posts {
            comments {
              author {
                posts {
                  title
                }
              }
            }
          }
        }
      }
    }
  }
}

Query analysis:

Level 1: 100 users
Level 2: 100 × 10 posts = 1,000
Level 3: 1,000 × 20 comments = 20,000
Level 4: 20,000 × 1 author = 20,000
Level 5: 20,000 × 10 posts = 200,000
Level 6: 200,000 × 20 comments = 4,000,000
Total operations: 4,221,100

Single query can overwhelm server.

Protection mechanisms:

Query depth limiting:

if query_depth(query) > 5:
    return Error("Query too deep")

Complexity scoring:

# Assign cost to each field
complexity = users(100) * 10 + posts * 5 + comments * 2
if complexity > 1000:
    return Error("Query too complex")

Timeout protection:

with timeout(5):  # 5 second maximum
    execute_query()

Nested queries can create exponential load.

Caching Challenges with GraphQL

REST caching is straightforward

GET /api/users/123
Cache-Control: max-age=300

GET /api/posts/456  
Cache-Control: max-age=60

URL identifies resource
HTTP caching works (CDN, browser)
Cache invalidation by URL

GraphQL breaks traditional caching

All queries go to single endpoint:

POST /graphql
{"query": "{ user(id: 123) { name } }"}

POST /graphql  
{"query": "{ user(id: 123) { name email } }"}

Same user, different queries, same URL.

Why POST breaks caching:

POST requests typically not cached
Request body contains query
CDNs can’t cache POST
Browser won’t cache POST

GraphQL caching strategies:

Application-level caching:

@cache_result(key=query_hash, ttl=300)
def execute_query(query):
    return graphql.execute(query)

Field-level caching:

@field_resolver('User.posts')
@cache(ttl=60)
def resolve_posts(user):
    return load_posts(user.id)

Client-side normalized cache:

Apollo Client stores by ID
Updates all queries using that object
Complex but powerful

POST requests and dynamic queries prevent HTTP caching.

GraphQL Architectural Trade-offs

GraphQL changes fundamental assumptions about APIs

Unified query interface:

# Single endpoint handles all queries
POST /graphql

query GetDashboardData {
  user(id: 123) {
    name
    recentPosts(limit: 3) {
      title
      comments(limit: 1) {
        text
      }
    }
  }
}

Contrast with REST equivalent:

GET /users/123          # User data
GET /users/123/posts    # User's posts  
GET /posts/456/comments # Comments for each post
GET /posts/789/comments
GET /posts/012/comments

Performance characteristics:

GraphQL advantages:

Fewer network round trips
Precise data fetching (no over-fetching)
Strong typing prevents runtime errors

GraphQL costs:

Query parsing overhead
Complex caching (can’t use HTTP cache)
Potential for expensive queries
N+1 query problems without careful design

Error handling differences:

REST: HTTP status codes indicate error types

GET /users/999 → 404 Not Found
GET /users/123 → 200 OK with user data

GraphQL: Always returns 200 with error details

{
  "data": {"user": null},
  "errors": [
    {"message": "User not found", "path": ["user"]}
  ]
}

Each approach optimizes different aspects of API interaction.

Complexity Management: When Simple Becomes Hard

GraphQL’s flexibility creates new complexity

Simple REST endpoint:

@app.route('/users/<int:user_id>')
def get_user(user_id):
    user = User.query.get_or_404(user_id)
    return jsonify(user.to_dict())

Equivalent GraphQL implementation:

# Schema definition
type_defs = """
    type User {
        id: ID!
        name: String!
        posts: [Post!]!
    }
    
    type Query {
        user(id: ID!): User
    }
"""

# Resolver with N+1 protection
def resolve_user(obj, info, id):
    return User.query.get(id)

def resolve_posts(user, info):
    # Without DataLoader: N+1 problem
    # With DataLoader: Complex batching logic
    return post_loader.load(user.id)

# Query complexity analysis
def analyze_query_complexity(query_ast):
    complexity = 0
    for field in query_ast.selections:
        complexity += calculate_field_cost(field)
        if complexity > MAX_QUERY_COST:
            raise GraphQLError("Query too complex")
    return complexity

Operational complexity increases:

Monitoring REST:

HTTP status codes indicate health
URL patterns for different resources
Standard load balancer routing

Monitoring GraphQL:

All queries return 200 (errors in body)
Query analysis needed for performance
Single endpoint for all traffic
Custom metrics for query complexity

Flexibility introduces operational complexity.

Integration Patterns: Timeouts and Retries

The Distributed Timeout Problem

Network calls introduce unpredictable delays

Single process function call:

def calculate_score(data):
    result = complex_computation(data)  # 50ms, predictable
    return result

Distributed service call:

def calculate_score(data):
    response = requests.post('http://ml-service/predict', 
                           json=data)  # ??? ms, unpredictable
    return response.json()

Sources of unpredictability:

Network latency: 1-100ms base cost
Server load: Queue time varies
Geographic distance: Speed of light limits
Network congestion: Shared bandwidth
Service cold starts: 1-10 second delays

Timeouts cascade through service chains:

Service A calls Service B calls Service C:

# Service A: 30 second timeout
response_b = requests.get(url_b, timeout=30)

# Service B: 30 second timeout  
response_c = requests.get(url_c, timeout=30)

# Service C: Takes 25 seconds to respond

What happens:

C takes 25 seconds (within its timeout)
B waits 25 seconds, then processes (29 total)
A waits 29 seconds, gets response just in time
Any additional delay breaks everything

Timeout strategies must coordinate across service boundaries

Hierarchical timeouts:

# Service A: Generous timeout for user-facing request
timeout_a = 10.0  # 10 seconds

# Service B: Leaves buffer for processing
timeout_b = 8.0   # 8 seconds  

# Service C: Tightest timeout for backend
timeout_c = 6.0   # 6 seconds

Each layer reserves time for its own processing.

Coordinated timeouts prevent cascade failures.

Connection vs Request Timeouts

Different phases of network communication have different failure modes

Connection timeout: Establishing TCP connection

import socket
import requests

# Connection timeout: How long to wait for TCP handshake
requests.get('http://api.service.com/data', 
            timeout=(3, 30))  # (connect, read)
#                  ↑
#               3 seconds to establish connection

Connection establishment steps:

DNS resolution: 10-100ms
TCP handshake: 1-3 round trips
TLS handshake: 2-3 round trips (HTTPS)

Typical connection timeout: 3-10 seconds

Read timeout: Waiting for response

# Read timeout: How long to wait for response after connection
requests.get('http://api.service.com/data',
            timeout=(3, 30))  # (connect, read)
#                     ↑
#                  30 seconds for complete response

Why separate timeouts matter:

Connection timeout failures indicate:

Service completely down
Network infrastructure problems
DNS resolution issues
Firewall blocking connections

Read timeout failures indicate:

Service overwhelmed (high queue time)
Long-running operation
Partial network failure
Service processing issues

Retry strategy depends on timeout type:

def call_service(url, data, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.post(url, json=data, 
                                   timeout=(3, 30))
            return response.json()
        except requests.ConnectTimeout:
            # Connection failed - service likely down
            # Retry immediately (fail fast)
            continue
        except requests.ReadTimeout:
            # Request sent but no response
            # Longer backoff (service may be overloaded)
            time.sleep(2 ** attempt)
            continue
    raise ServiceUnavailableError()

Different timeout types indicate different failure modes.

Retry Strategies: When and How

Not all failures should trigger retries

Immediate retry (no backoff):

def immediate_retry(func, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            return func()
        except ConnectionError:
            # Network connectivity issue - retry immediately
            if attempt == max_attempts - 1:
                raise
            continue
    
# Use for: Connection failures, DNS timeouts

Exponential backoff with jitter:

import random
import time

def exponential_backoff_retry(func, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            return func()
        except (ReadTimeout, ServerError) as e:
            if attempt == max_attempts - 1:
                raise
            
            # Base delay: 2^attempt seconds
            delay = 2 ** attempt
            
            # Add jitter to prevent thundering herd
            jitter = random.uniform(0, 0.1 * delay)
            total_delay = delay + jitter
            
            time.sleep(total_delay)
            continue

# Retry sequence: 1s, 2s, 4s, 8s, 16s (with jitter)

Fixed interval retry:

def fixed_interval_retry(func, interval=5, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            return func()
        except ServiceUnavailableError:
            if attempt == max_attempts - 1:
                raise
            time.sleep(interval)  # Always wait 5 seconds
    
# Use for: Known service restart windows

When NOT to retry:

def should_retry(exception, response=None):
    # Never retry these conditions
    if isinstance(exception, AuthenticationError):
        return False  # 401 - bad credentials
    
    if isinstance(exception, AuthorizationError):
        return False  # 403 - insufficient permissions
        
    if response and response.status_code == 400:
        return False  # Bad request - won't improve
    
    if response and response.status_code == 404:
        return False  # Not found - resource doesn't exist
    
    # Retry these conditions
    if isinstance(exception, (ConnectionError, ReadTimeout)):
        return True   # Transient network issues
    
    if response and response.status_code in [500, 502, 503, 504]:
        return True   # Server errors - may recover
    
    return False

Choose retry strategy based on failure type and system constraints.

Idempotency: Safe Retry Foundation

Retries are only safe when operations are idempotent

Problem: Non-idempotent operations

# Dangerous to retry - could double-charge customer
def charge_credit_card(customer_id, amount):
    response = requests.post('https://payments.api/charge', {
        'customer_id': customer_id,
        'amount': amount,
        'currency': 'USD'
    })
    # Network timeout after sending request
    # Did the charge succeed? Unknown - timeout occurred before response
    return response.json()

# Retry could result in:
charge_credit_card(123, 50.00)  # $50 charged
# Timeout, retry...
charge_credit_card(123, 50.00)  # Another $50 charged!

Solution: Idempotency keys

import uuid

def charge_credit_card_safe(customer_id, amount, idempotency_key=None):
    if not idempotency_key:
        idempotency_key = str(uuid.uuid4())
    
    response = requests.post('https://payments.api/charge', {
        'customer_id': customer_id,
        'amount': amount,
        'currency': 'USD',
        'idempotency_key': idempotency_key  # Unique per logical operation
    })
    return response.json()

# Server implementation tracks processed keys:
def process_payment(request):
    key = request.get('idempotency_key')
    
    # Check if already processed
    existing = db.query("SELECT * FROM payments WHERE idempotency_key = ?", key)
    if existing:
        return existing.response  # Return same result as before
    
    # Process payment
    result = charge_card(request)
    
    # Store result with key
    db.execute("INSERT INTO payments (idempotency_key, response) VALUES (?, ?)",
               key, result)
    return result

Idempotency patterns:

Natural idempotency (safe by design):

# GET requests
user = get_user(123)  # Always safe to repeat

# PUT requests (full replacement)
update_user(123, {"name": "Alice", "email": "alice@example.com"})

Conditional updates (compare-and-swap):

def update_counter(counter_id, expected_value, new_value):
    result = db.execute("""
        UPDATE counters 
        SET value = ? 
        WHERE id = ? AND value = ?
    """, new_value, counter_id, expected_value)
    
    if result.rowcount == 0:
        raise ConflictError("Counter was modified")
    return new_value

Idempotency key tracking:

# Client generates unique key per logical operation
operation_key = f"user-{user_id}-update-{timestamp}"
update_user_profile(user_id, data, idempotency_key=operation_key)

Idempotency keys enable safe retries of financial operations.

Circuit Breaker Pattern

Stop calling failing services to prevent resource exhaustion

Problem: Cascading failures

# Service A keeps trying to call failing Service B
def get_user_recommendations(user_id):
    for attempt in range(5):  # Keep retrying
        try:
            # Service B is down - this will always fail
            response = requests.get(f'http://ml-service/recommend/{user_id}',
                                  timeout=30)
            return response.json()
        except Exception:
            time.sleep(2)  # Wasting time and resources
            continue
    raise ServiceUnavailableError()

# Results in:
# - 5 × 30 second timeouts = 2.5 minutes per user
# - Thread pool exhaustion
# - Memory leak from pending requests
# - Service A becomes unavailable too

Circuit breaker solution:

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"      # Normal operation
    OPEN = "open"          # Failing fast
    HALF_OPEN = "half_open"  # Testing recovery

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED
    
    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time < self.recovery_timeout:
                raise CircuitBreakerOpenError("Service unavailable")
            else:
                self.state = CircuitState.HALF_OPEN
        
        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise
    
    def _on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED
    
    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

# Usage with circuit breaker
ml_service_breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=30)

def get_user_recommendations_safe(user_id):
    try:
        return ml_service_breaker.call(
            lambda: requests.get(f'http://ml-service/recommend/{user_id}',
                               timeout=5).json()
        )
    except CircuitBreakerOpenError:
        # Return cached recommendations or default
        return get_default_recommendations(user_id)

Circuit breaker states:

Closed (normal): All requests pass through
Open (failing): All requests fail immediately
Half-open (testing): Single test request allowed

Metrics for circuit breaker tuning:

Failure threshold: 3-10 failures (depends on service criticality)
Recovery timeout: 30-300 seconds (based on typical recovery time)
Success threshold: 1-3 successes to close circuit

Circuit breaker prevents cascade failures by failing fast.

Timeout and Retry Integration Strategy

Combining timeouts, retries, and circuit breakers

Layered resilience strategy:

import asyncio
from typing import Optional, Callable, Any

class ResilientServiceClient:
    def __init__(self, base_url: str):
        self.base_url = base_url
        self.circuit_breaker = CircuitBreaker(
            failure_threshold=3,
            recovery_timeout=30
        )
        self.session = requests.Session()
        # Connection pooling for efficiency
        self.session.mount('http://', requests.adapters.HTTPAdapter(
            pool_connections=10, pool_maxsize=20
        ))
    
    async def call_service(self, 
                          endpoint: str, 
                          data: Optional[dict] = None,
                          max_retries: int = 3) -> dict:
        """
        Resilient service call with integrated patterns:
        - Hierarchical timeouts
        - Exponential backoff retries  
        - Circuit breaker protection
        - Request tracing
        """
        request_id = generate_request_id()
        
        for attempt in range(max_retries + 1):
            try:
                # Circuit breaker check
                if self.circuit_breaker.state == CircuitState.OPEN:
                    raise CircuitBreakerOpenError(
                        f"Circuit breaker open for {self.base_url}"
                    )
                
                # Calculate timeout (shorter on retries)
                connect_timeout = 3.0
                read_timeout = max(10.0 - (attempt * 2), 5.0)
                
                start_time = time.time()
                response = await self._make_request(
                    endpoint, data, request_id,
                    timeout=(connect_timeout, read_timeout)
                )
                
                # Success - reset circuit breaker
                self.circuit_breaker._on_success()
                
                # Log success metrics
                duration = time.time() - start_time
                self._log_request(request_id, endpoint, attempt, duration, "success")
                
                return response.json()
                
            except requests.exceptions.ConnectTimeout:
                # Connection timeout - retry immediately
                self._log_request(request_id, endpoint, attempt, None, "connect_timeout")
                if attempt < max_retries:
                    continue
                raise ServiceUnavailableError("Connection timeout")
                
            except requests.exceptions.ReadTimeout:
                # Read timeout - exponential backoff
                self.circuit_breaker._on_failure()
                self._log_request(request_id, endpoint, attempt, None, "read_timeout")
                
                if attempt < max_retries:
                    backoff_time = (2 ** attempt) + random.uniform(0, 1)
                    await asyncio.sleep(backoff_time)
                    continue
                raise ServiceUnavailableError("Read timeout")
                
            except requests.exceptions.HTTPError as e:
                if e.response.status_code >= 500:
                    # Server error - retry with backoff
                    self.circuit_breaker._on_failure()
                    if attempt < max_retries:
                        backoff_time = (2 ** attempt) + random.uniform(0, 1)
                        await asyncio.sleep(backoff_time)
                        continue
                else:
                    # Client error - don't retry
                    raise
                    
        raise ServiceUnavailableError(f"Max retries exceeded for {endpoint}")
    
    def _log_request(self, request_id: str, endpoint: str, 
                    attempt: int, duration: Optional[float], status: str):
        """Structured logging for debugging and monitoring"""
        logger.info({
            "request_id": request_id,
            "service": self.base_url,
            "endpoint": endpoint,
            "attempt": attempt,
            "duration_ms": duration * 1000 if duration else None,
            "status": status,
            "circuit_breaker_state": self.circuit_breaker.state.value
        })

Real-world timeout hierarchy example:

# API Gateway → Auth Service → Database
API_GATEWAY_TIMEOUT = 30.0    # User-facing request
AUTH_SERVICE_TIMEOUT = 25.0   # Leaves 5s buffer
DATABASE_TIMEOUT = 20.0       # Leaves 5s buffer for processing

# Each layer reserves processing time

Integrated patterns provide comprehensive resilience.

The Long Operation Problem and Async Solutions

Some operations take too long for synchronous HTTP

Typical HTTP request/response works for fast operations:

# Fast operation: 50ms
response = requests.get('https://api.service.com/users/123')
user = response.json()  # Works fine

Long-running operations break this model:

# Video transcoding: 5 minutes
response = requests.post('https://api.service.com/transcode',
                        json={'video_url': 'input.mp4'},
                        timeout=300)  # Wait 5 minutes?

# Problems:
# - Client connection held open entire time
# - Network interruption loses everything
# - No progress visibility
# - Client can't do anything else

Core problem: Need to decouple submission from completion

Three solutions exist, each with different trade-offs:

Polling: Client repeatedly checks “are you done yet?”
Webhooks: Server calls client when done
WebSockets: Persistent bidirectional connection

All three share the same pattern: Submit job → get job_id → retrieve result later. They differ in how the result is retrieved.

Pattern comparison at a glance:

Polling - Simple but wasteful:

job_id = submit_job()
while not done:
    status = check_status(job_id)  # Repeated HTTP requests
    time.sleep(5)  # Most return "not done yet"

Webhooks - Efficient but complex setup:

job_id = submit_job(callback_url='https://my-app.com/done')
# Server POSTs result to callback_url when complete
# No wasted requests, but client needs public endpoint

WebSockets - Real-time but resource-intensive:

ws.connect()  # Single persistent connection
ws.send(start_job)
# Server pushes updates as they happen
# Immediate updates, but holds connection open

All three patterns decouple submission from completion.

Polling and Webhooks: Two Retrieval Strategies

Polling: Client-driven status checks

Submit once, check repeatedly:

# Submit → get job_id immediately
job_id = submit_job({'operation': 'transcode', 'input': 'video.mp4'})

# Poll until complete
while True:
    status = check_status(job_id)
    if status['complete']:
        return status['result']
    time.sleep(5)  # Wait and try again

Server tracks job state:

jobs["abc-123"] = {
    "status": "processing",  # pending → processing → completed/failed
    "progress": 45,
    "result": None
}

Webhooks: Server-driven notifications

Submit with callback URL:

# Client submits with callback URL
job_id = submit_job({
    'operation': 'transcode',
    'input': 'video.mp4',
    'callback_url': 'https://my-app.com/webhooks/transcode'
})

# Client provides endpoint - server calls this when done
@app.post('/webhooks/transcode')
def handle_complete(request):
    data = request.json()  # {job_id, status, result}
    update_database(data['job_id'], data['result'])

Server notifies client:

# When job completes, POST to client's callback_url
requests.post(callback_url, json={'job_id': job_id, 'result': result})

Trade-offs comparison:

Aspect	Polling	Webhooks
Efficiency	Wasteful (most checks return “not ready”)	Efficient (one notification)
Latency	poll_interval/2 average	Immediate
Client requirements	Simple HTTP client	Public endpoint required
Firewall-friendly	Yes (outbound only)	No (needs inbound)
Reliability	Client controls retry	Server must retry failed deliveries

When to use:

Polling: Mobile apps, browsers, firewall-restricted clients
Webhooks: Server-to-server, CI/CD pipelines, payment processors

Polling is simple but wasteful; webhooks are efficient but require public endpoints.

WebSockets: Continuous vs Discrete Updates

Polling and webhooks handle discrete operations

Submit job → wait → get result. One submission, one result.

WebSockets handle continuous streams

# Connection stays open, updates flow continuously
ws.connect("wss://api.service.com/live")
ws.send({"subscribe": "job_updates"})

while True:
    update = ws.recv()  # Server pushes whenever state changes
    # Progress: 25%, 50%, 75%, 100%

The connection itself is the communication channel, not individual HTTP requests.

Video transcoding (5 minutes)

Discrete: submit → wait → result

Polling: 30 checks, 29 return “not ready”
Webhook: 2 requests (submit + notification)
WebSocket: Unnecessary overhead

Live dashboard (updates every second)

Continuous: constant stream of values

Polling: 3600 requests/hour per client
Webhook: Doesn’t fit (not discrete events)
WebSocket: Push updates as they occur

Mobile app vs Backend service

Mobile can’t receive webhooks (no public endpoint):

Must use polling or WebSocket
Polling simpler for discrete operations

Backend can expose endpoints:

Webhooks for discrete events (payments)
WebSocket for continuous streams

Combining approaches for reliability:

# Webhook with polling fallback
job_id = submit_job(callback='https://my-app.com/webhook')
result = wait_for_webhook(timeout=300) or poll_until_done(job_id)

Webhook efficiency when network is reliable, polling safety when it isn’t.

Production Operational Concerns

CORS - Browser Same-Origin Security

Problem: API works in Postman, fails in browser

// JavaScript in browser at http://localhost:3000
fetch('http://localhost:5000/predict', {
    method: 'POST',
    body: JSON.stringify({features: [1, 2, 3]})
})
// Error: CORS policy: No 'Access-Control-Allow-Origin' header

Same-origin policy - Browser security restriction:

Requests allowed to same protocol + domain + port
Requests blocked to different origins

Examples:

http://localhost:3000 → http://localhost:5000 Blocked - Different ports

https://app.example.com → https://api.example.com Blocked - Different subdomains

https://app.example.com → https://app.example.com Allowed - Same origin

Not an API problem - browser enforces this

Postman bypasses CORS (not a browser) curl bypasses CORS (not a browser) Browser JavaScript cannot bypass CORS

CORS - Server Response Grants Permission

Browser sends preflight OPTIONS request before actual request

OPTIONS /predict HTTP/1.1
Host: localhost:5000
Origin: http://localhost:3000
Access-Control-Request-Method: POST
Access-Control-Request-Headers: Content-Type

Server must respond with permission headers:

HTTP/1.1 200 OK
Access-Control-Allow-Origin: http://localhost:3000
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization
Access-Control-Max-Age: 3600

Then browser sends actual request:

POST /predict HTTP/1.1
Host: localhost:5000
Origin: http://localhost:3000
Content-Type: application/json

{"features": [1, 2, 3]}

Flask implementation:

from flask_cors import CORS

app = Flask(__name__)
CORS(app, origins=['http://localhost:3000'])

# Or manual headers
@app.after_request
def add_cors_headers(response):
    response.headers['Access-Control-Allow-Origin'] = 'http://localhost:3000'
    response.headers['Access-Control-Allow-Headers'] = 'Content-Type'
    return response

Credentials require explicit permission:

# If sending cookies or Authorization header
Access-Control-Allow-Credentials: true

# Cannot use wildcard with credentials
Access-Control-Allow-Origin: http://localhost:3000

Preflight cached for Access-Control-Max-Age seconds

Correlation IDs for Request Tracing

Tracing requests across multiple services requires unique identifiers

Three services generating thousands of log entries:

# Gateway logs (10,000 entries)
[14:23:01.123] Processing request
[14:23:01.134] Processing request
[14:23:01.145] Processing request

# User Service logs (5,000 entries)
[14:23:01.234] Database query
[14:23:01.245] Database query
[14:23:01.256] Database query failed

# Payment Service logs (8,000 entries)
[14:23:01.345] Processing payment
[14:23:01.356] Processing payment

Without correlation: Cannot identify which entries belong to same request

With correlation ID: Thread unique identifier through all services

# Generate at API entry point
@app.before_request
def assign_request_id():
    request_id = request.headers.get('X-Request-ID', str(uuid.uuid4()))
    g.request_id = request_id

# Forward to downstream services
headers = {
    'X-Request-ID': g.request_id,
    'Authorization': get_token()
}
response = requests.post(user_service_url, headers=headers)

# Include in every log message
logger.info(f"[{g.request_id}] User {user_id} query failed")

Debugging with correlation ID:

grep "550e8400" *.log | sort

Finds all log entries for single user request across all services

Logging Best Practices - Structured Format

Structured logging: JSON format, not text strings

# Bad: Text logs hard to parse
logger.info(f"User {user_id} made prediction, took {duration}ms")

# Good: Structured JSON logs
logger.info(json.dumps({
    "timestamp": "2024-01-15T10:30:45Z",
    "level": "INFO",
    "request_id": "550e8400-e29b-41d4-a716-446655440000",
    "user_id": 123,
    "endpoint": "POST /predict",
    "duration_ms": 247,
    "status_code": 200
}))

Why JSON:

Easy to parse programmatically
Query with tools like jq
Aggregate metrics from logs
Filter by any field

What to log:

request_id - Correlation across services
user_id - Which user affected
endpoint - What operation
duration_ms - How long it took
status_code - Success or failure
error_message - What went wrong (if failed)

What NOT to log:

Passwords or API tokens
Credit card numbers
Request bodies with PII (personally identifiable information)
Full JWTs (contains user data)

Query structured logs:

# Find all failed requests
jq 'select(.status_code >= 500)' logs.json

# Find slow requests
jq 'select(.duration_ms > 1000)' logs.json

# Aggregate by endpoint
jq '.endpoint' logs.json | sort | uniq -c

API Gateway - Central Control Point

API Gateway sits between clients and backend services

Why gateway: Implement cross-cutting concerns once, not in every service

Six core functions:

1. Authentication/Authorization

Validate API keys or JWT tokens
Check permissions before routing
Single point for auth logic

2. Rate Limiting

Prevent abuse (100 requests/minute per client)
Protect backend services from overload
Return 429 when limit exceeded

3. Request Routing

Route to service versions (v1 vs v2)
A/B testing and canary deployments
Load balance across instances

4. Response Caching

Cache GET responses to reduce backend load
Configurable TTL per endpoint
Invalidation on mutations

5. Monitoring/Analytics

Track request counts, latencies, error rates
Per-client usage metrics
Identify problem clients

6. CORS Headers

Add Access-Control-Allow-Origin centrally
Don’t implement in every backend service

Without gateway:

Each service implements auth
Each service adds CORS
Each service does rate limiting
Each service monitors itself

With gateway:

Auth once at edge
CORS once at edge
Rate limiting once at edge
Centralized monitoring

AWS API Gateway Configuration

AWS API Gateway - Managed service, no servers to run

Endpoint structure:

https://{api-id}.execute-api.{region}.amazonaws.com/{stage}/{resource}

https://abc123.execute-api.us-east-1.amazonaws.com/prod/predict
         ↑                      ↑              ↑      ↑
      API ID                 Region          Stage  Resource

Configuration components:

Resources - URL paths

/users
/predict
/models/{id}

Methods - HTTP operations per resource

GET /users
POST /predict
DELETE /models/{id}

Integration - Backend target

Lambda function (serverless)
HTTP endpoint (existing API)
AWS service (DynamoDB, S3, etc.)

Stages - Environment separation

prod - Production traffic
staging - Pre-production testing
dev - Development environment

Each stage has independent configuration

Usage plans - Rate limits per API key:

# Create usage plan
{
  "name": "Basic Plan",
  "throttle": {
    "rateLimit": 100,      # requests/second
    "burstLimit": 200      # burst capacity
  },
  "quota": {
    "limit": 10000,        # requests
    "period": "DAY"        # per day
  }
}

Pricing:

$3.50 per million requests
$0.09 per GB data transfer
Caching: $0.02/hour per GB cache

API Gateway Request Lifecycle

Complete request flow through AWS API Gateway

1. Client makes request

curl -X POST \
  https://abc123.execute-api.us-east-1.amazonaws.com/prod/predict \
  -H 'x-api-key: 8fk3jsl9dkfj3k4j' \
  -H 'Content-Type: application/json' \
  -d '{"features": [1.2, 3.4, 5.6]}'

2. API Gateway validates API key

Checks if key exists and is valid
Returns 403 if invalid

3. API Gateway checks usage plan quota

Checks rate limit (100 req/sec)
Checks daily quota (10,000 req/day)
Returns 429 if exceeded

4. API Gateway routes to backend

Invokes Lambda function, or
Forwards to HTTP endpoint, or
Calls AWS service directly

5. Backend processes request

def lambda_handler(event, context):
    features = json.loads(event['body'])['features']
    prediction = model.predict(features)
    return {
        'statusCode': 200,
        'body': json.dumps({'prediction': float(prediction)})
    }

6. API Gateway logs to CloudWatch

Request ID, latency, status code
Enables debugging and monitoring

7. API Gateway adds CORS headers

Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, GET, OPTIONS

8. Response returned to client

{
  "prediction": 0.87
}

CloudWatch metrics available:

Request count
Latency (p50, p90, p99)
Error rate
Cache hit rate