Containers and Orchestration

EE 547 - Unit 2

Dr. Brandon Franzke

Fall 2025

Virtualization Fundamentals

The Resource Utilization Problem

Cost of Underutilization

The Business Case

10% improvement in utilization = Billions in savings

The Isolation Requirement

Virtualization: Core Concept

Key Components

Resource Multiplexing and Isolation

CPU Virtualization: Trap and Emulate

Memory Virtualization Challenge

Shadow Page Tables vs EPT

I/O Virtualization Approaches

Hardware Virtualization Support

Intel VT-x and AMD-V

The Game Changer (2005-2006)

Before hardware support:

Binary translation required
Complex hypervisor code
Performance penalties

After hardware support:

Native virtualization
Simplified hypervisors
Near-native performance

Key Features

VMX Operations (Intel terminology)

VMXON: Enable virtualization
VMLAUNCH: Start VM
VMRESUME: Continue VM
VMEXIT: Return to hypervisor

VMCS/VMCB - Virtual Machine Control Structure

Stores complete VM state
Controls VM behavior
Hardware-managed context switching

Type 1 vs Type 2 Hypervisors

Resource Scheduling in Hypervisors

CPU Scheduling Algorithms

Credit Scheduler (Xen)

Each VM allocated credits based on weight
VM consumes credits when running
Depleted credits → lower priority
Credits refresh periodically

CFS (KVM - Completely Fair Scheduler)

Equal CPU time by default
Configurable weights/shares
Nice values for priority
Real-time scheduling available

Memory Management Techniques

Ballooning: Cooperative memory reclamation

Page Sharing: Deduplicate identical pages

Compression: Compress inactive pages

Swapping: Last resort - disk backing

VM Lifecycle States

Virtual Hardware Configuration

What Constitutes a VM?

Virtual CPU Configuration

<vcpu placement='static'>4</vcpu>
<cpu mode='host-passthrough'>
  <topology sockets='1' cores='4' threads='1'/>
</cpu>

Virtual Memory

<memory unit='GiB'>16</memory>
<currentMemory unit='GiB'>16</currentMemory>

Virtual Storage

<disk type='file' device='disk'>
  <driver name='qemu' type='qcow2' cache='writeback'/>
  <source file='/var/lib/vms/ubuntu.qcow2'/>
  <target dev='vda' bus='virtio'/>
</disk>

Virtual Network

<interface type='bridge'>
  <mac address='52:54:00:6b:3c:58'/>
  <source bridge='br0'/>
  <model type='virtio'/>
</interface>

Boot Process in a Virtual Machine

Step-by-Step Boot Sequence

VM Creation

Hypervisor allocates:
- Memory regions
- Virtual CPU structures
- Device emulation threads

Firmware Initialization

Virtual BIOS/UEFI:
- Memory detection (fake)
- Device enumeration (virtual)
- Boot device selection

Bootloader

GRUB/Windows Boot Manager:
- Reads virtual disk
- Loads kernel into memory
- Sets up initial ramdisk

Operating System

Kernel initialization:
- Detects "hardware" (all virtual)
- Loads drivers
- Starts init process

Guest Perspective vs Reality

Inside the VM

$ lscpu
Architecture:          x86_64
CPU(s):                4
Model name:            Intel Xeon E5-2680 v4
CPU MHz:               2400.000
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              35840K

$ free -h
              total        used        free
Mem:           16Gi       2.1Gi        12Gi

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       100G   45G   55G  45% /

On the Host

$ ps aux | grep qemu
qemu-system-x86_64 -enable-kvm -m 16384 -smp 4 \
  -drive file=vm.qcow2 -netdev tap,id=net0

$ top
PID    USER  PR  VIRT    RES    SHR  %CPU  %MEM  COMMAND
15234  qemu  20  17.2g   16.1g  4.2g  385   12.5  qemu-system-x86

The VM is just another process!

Example: Creating a VM

Using QEMU/KVM

# 1. Create disk image
qemu-img create -f qcow2 ubuntu-vm.qcow2 50G

# 2. Start VM with Ubuntu installer
qemu-system-x86_64 \
  -enable-kvm \                    # Use hardware acceleration
  -cpu host \                       # Pass through CPU features
  -m 8192 \                        # 8GB RAM
  -smp cores=4 \                   # 4 CPU cores
  -drive file=ubuntu-vm.qcow2,if=virtio \
  -cdrom ubuntu-22.04.iso \
  -boot d \                        # Boot from CD
  -vga qxl \
  -spice port=5900,addr=127.0.0.1

# 3. Connect to console
remote-viewer spice://localhost:5900

# 4. After installation, normal boot
qemu-system-x86_64 \
  -enable-kvm \
  -cpu host \
  -m 8192 \
  -smp cores=4 \
  -drive file=ubuntu-vm.qcow2,if=virtio \
  -netdev user,id=net0,hostfwd=tcp::2222-:22 \
  -device virtio-net,netdev=net0

Performance Overhead Analysis

CPU and Memory Performance

CPU Performance Factors

VM Exits - The Main Culprit

Each privileged instruction causes exit
Exit cost: 1000-4000 cycles
Mitigation: Hardware assists reduce exits

Cache Effects

VM switches flush TLB
Cache pollution from hypervisor
NUMA complications

Memory Performance

Translation Overhead

Two-level page walks
TLB pressure increased
Larger page tables

Measured Impact

STREAM Benchmark (Memory Bandwidth):
Native:        95 GB/s
VM with EPT:   88 GB/s  (7% loss)
VM without:    68 GB/s  (28% loss)

I/O Performance Penalties

GPU Virtualization for ML

Summary: When to Use VMs

Ideal Use Cases

Strong Isolation Required

Multi-tenant environments
Security-sensitive workloads
Compliance requirements

Different Operating Systems

Windows and Linux on same host
Legacy application support

Resource Guarantees

Dedicated CPU/memory allocation
Predictable performance

Limitations for ML/Data Science

Performance Overhead

5-20% depending on workload
GPU virtualization challenges

Resource Efficiency

GB of overhead per instance
Slow startup times

Next: How containers solve these problems…

Container Architecture & Internals

The Problem Containers Solve

What Is a Kernel?

The kernel is the core of the operating system that:

Manages hardware: Only the kernel can directly touch CPU, RAM, disk, network cards
Enforces isolation: Processes can’t see each other’s memory
Provides abstraction: Files, sockets, processes are kernel concepts
Schedules work: Decides which process runs on which CPU core

Kernel Privilege Levels

Why can’t applications access hardware directly?

CPU enforces privilege levels in hardware
Ring 3 code physically cannot execute privileged instructions
Attempting privileged operations causes CPU trap to kernel
Kernel decides whether to allow or deny the operation

Containers vs VMs:

Containers: Share host kernel (same Ring 0)
VMs: Each has own kernel (separate Ring 0)

Why Not Just Use VMs?

What Is a Container Really?

Demonstration: Container = Process

See it yourself

# Terminal 1: Start a container
$ docker run -d --name demo nginx
a5f3c8b9d2e1

# Terminal 2: Find it as a process
$ ps aux | grep nginx
root     15234  0.0  0.1  141836  2308 ?  Ss  10:42  nginx: master
www-data 15235  0.0  0.1  142268  3544 ?  S   10:42  nginx: worker

# It's just a process!
$ pstree -p 15234
nginx(15234)───nginx(15235)

# Check its namespaces
$ ls -la /proc/15234/ns/
lrwxrwxrwx 1 root root ipc:[4026532439]    # Isolated IPC
lrwxrwxrwx 1 root root mnt:[4026532437]    # Isolated mounts
lrwxrwxrwx 1 root root net:[4026532442]    # Isolated network
lrwxrwxrwx 1 root root pid:[4026532440]    # Isolated PIDs
lrwxrwxrwx 1 root root uts:[4026532438]    # Isolated hostname

# Kill it like any process
$ kill 15234
$ docker ps -a
CONTAINER ID   STATUS      
a5f3c8b9d2e1   Exited (137)

Shared Kernel Architecture

Linux Namespaces: The Isolation Mechanism

Namespace Types in Detail

Six Types of Isolation

Namespace	Isolates	Year	Example
Mount	Filesystem mount points	2002	Container sees `/` as its image
UTS	Hostname and domain name	2006	Container has own hostname
IPC	Inter-process communication	2006	Separate shared memory
PID	Process IDs	2008	Container PID 1 is init
Network	Network devices, stacks, ports	2009	Own IP address, ports
User	User and group IDs	2013	Root in container ≠ host root

Creating a Namespace

// System call to create namespaces
clone(CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS, ...)

// Or using unshare
unshare(CLONE_NEWPID)

PID Namespace in Action

Mount Namespace: Filesystem Isolation

Network Namespace

Control Groups (cgroups)

Putting It Together: Namespace + cgroups

# Create a simple container manually

# 1. Create namespaces
unshare --pid --mount --net --uts --ipc --fork bash

# 2. Set hostname (UTS namespace)
hostname my-container

# 3. Mount proc (PID namespace)
mount -t proc proc /proc

# 4. Configure network (Network namespace)
ip link set lo up
ip addr add 172.17.0.2/16 dev eth0

# 5. Set resource limits (cgroups)
echo $$ > /sys/fs/cgroup/memory/docker/container1/cgroup.procs
echo 4294967296 > /sys/fs/cgroup/memory/docker/container1/memory.limit_in_bytes

# You've created a container!

This is exactly what Docker does

Plus image management
Plus network setup
Plus storage layers
Plus convenience

Container Images: Structure

Layer Efficiency

Union Filesystems

How Layers Become One Filesystem

OverlayFS - The modern standard

Layer 1: Ubuntu base     [Read-only]
         /bin/bash
         /lib/libc.so
         
Layer 2: Python install  [Read-only]
         /usr/bin/python3
         /usr/lib/python3.8/
         
Layer 3: App code       [Read-only]
         /app/main.py
         /app/config.yaml
         
Container Layer:        [Read-write]
         (empty initially)
         
Union View:            [What container sees]
         /bin/bash         (from layer 1)
         /usr/bin/python3  (from layer 2)
         /app/main.py      (from layer 3)

Copy-on-Write (CoW)

When container modifies a file: 1. File copied from read-only layer 2. Copy placed in read-write layer 3. Original remains unchanged

Build Cache Demonstration

From Image to Running Container

The Journey

Pull/Build Image
- Download layers (if needed)
- Store in /var/lib/docker
Create Container
```
docker create ubuntu:20.04
```
- Allocate container ID
- Create filesystem layers
- Prepare config.json
Start Container
```
docker start <container-id>
```
- Create namespaces
- Setup cgroups
- Mount layers
- Configure network
- Execute entrypoint
Container Running
- Process isolated
- Resources limited
- Network connected

Container Runtime Architecture

What Happens During `docker run`

Complete Sequence

$ docker run -d -p 8080:80 --name web nginx

Client → Daemon
- Parse command
- Send API request
Image Check
- Look for nginx:latest locally
- Pull if missing
Create Container
- Generate container ID
- Create filesystem snapshot
- Write config.json
Setup Namespaces
- Create PID namespace
- Create network namespace
- Create mount namespace
Configure cgroups
- Set memory limits
- Set CPU shares
Network Setup
- Create veth pair
- Connect to bridge
- Configure iptables
Start Process
- Execute nginx
- As PID 1 in container

Container Security Model

Why Containers Aren’t VMs

Fundamental Differences

Shared Kernel

All containers run on same kernel
Kernel bug = all containers vulnerable
No kernel customization per container

Process-Level Isolation

Containers are processes
Can be killed like any process
Share system resources

Security Boundaries

VMs: Hardware-enforced
Containers: Kernel-enforced
Escape easier in containers

When This Matters

Use VMs when:

Running untrusted code
Need different kernels
Require complete isolation
Compliance requirements

Use Containers when:

Deploying trusted applications
Need fast startup
Want efficient resource use
Building microservices

Docker: From Theory to Practice

Images vs Containers: The Critical Distinction

Think of it like this: - Image = Python class definition (blueprint) - Container = Instance of that class (running object) - Images are immutable; containers have writable layer - docker images lists images; docker ps lists containers

Where Do Images Come From?

Official vs User Images:

python → Official Python image
username/myapp → User’s image
gcr.io/project/image → Google Container Registry
Always specify tags; avoid latest in production

Essential Docker Commands

# Image Management
docker pull python:3.11              # Download image from registry
docker images                         # List local images
docker rmi python:3.11               # Remove image
docker build -t myapp:v1 .          # Build image from Dockerfile

# Container Lifecycle
docker run python:3.11 python -c "print('Hello')"  # Run and exit
docker run -d nginx                  # Run in background (detached)
docker run -it ubuntu bash          # Interactive terminal
docker ps                            # List running containers
docker ps -a                         # List all containers
docker stop container_id            # Stop gracefully
docker kill container_id            # Force stop
docker rm container_id               # Remove stopped container

# Debugging and Inspection
docker logs container_id            # View output
docker exec -it container_id bash   # Enter running container
docker inspect container_id         # Full container details
docker stats                        # Resource usage

Docker Run: The Swiss Army Knife

Docker Architecture

Docker daemon manages container lifecycle
containerd handles container runtime
runc creates and runs containers (OCI compliant)

Image Layers and Build Cache

Layer ordering impacts build performance:

Base images change rarely → top layers
Application code changes frequently → bottom layers
Dependencies in middle based on change frequency

Dockerfile Best Practices for ML

# Multi-stage build for production
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim
WORKDIR /app

# Copy only installed packages
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH

# Copy application last for cache efficiency
COPY model/ ./model/
COPY src/ ./src/
CMD ["python", "src/inference.py"]

Key principles:

Multi-stage builds reduce final image size
Copy requirements before code (cache efficiency)
Use specific base image versions
Minimize layers in production stage

Container Resource Limits

Resource limits prevent:

Memory leaks affecting other containers
CPU monopolization
Network bandwidth saturation

Docker Networking Modes

Bridge (default)

docker run --network bridge app

Isolated network namespace
NAT for external access
Inter-container communication via bridge

Host

docker run --network host app

No network isolation
Direct host network access
Best performance, less secure

None

docker run --network none app

Complete network isolation
No network interfaces
Security-critical workloads

Custom Bridge

docker network create ml-net
docker run --network ml-net app

User-defined networks
Automatic DNS resolution
Network segmentation

Volume Management for ML Workloads

Volume selection criteria:

Performance: tmpfs > bind mount > named volume > network
Persistence: named volume, bind mount for stateful data
Sharing: NFS for multi-node access

Container Security Boundaries

# Run as non-root user
docker run --user 1000:1000 app

# Read-only root filesystem
docker run --read-only \
  --tmpfs /tmp \
  --tmpfs /var/run \
  app

# Drop capabilities
docker run --cap-drop ALL \
  --cap-add NET_BIND_SERVICE \
  app

# Security options
docker run --security-opt no-new-privileges \
  --security-opt apparmor=docker-default \
  app

Security layers:

User namespaces map container users to host UIDs
Capability dropping reduces attack surface
Read-only filesystems prevent persistence
AppArmor/SELinux provide mandatory access control

Production Patterns: Health Checks

# Dockerfile with health check
FROM python:3.11-slim

# Health check configuration
HEALTHCHECK --interval=30s \
  --timeout=3s \
  --start-period=60s \
  --retries=3 \
  CMD python -c "import requests; requests.get('http://localhost:8080/health')"

# Application setup
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt

EXPOSE 8080
CMD ["python", "app.py"]

Health check types:

Liveness: Is the container running?
Readiness: Can the container accept traffic?
Startup: Has initialization completed?

Docker daemon actions on failure:

Restart container (restart policy dependent)
Remove from load balancer
Alert monitoring system

Debugging Containers

# Inspect running processes
docker exec container_id ps aux

# Attach to running container
docker exec -it container_id /bin/bash

# View container logs
docker logs --follow --tail 100 container_id

# Inspect container metadata
docker inspect container_id | jq '.State'

# Copy files from container
docker cp container_id:/app/logs/error.log ./

# Monitor resource usage
docker stats container_id

Advanced debugging:

# Network debugging
docker exec container_id netstat -tulpn
docker exec container_id tcpdump -i eth0

# Process tracing
docker exec container_id strace -p 1

# File system analysis
docker exec container_id df -h
docker exec container_id lsof

Container Registries and Distribution

Registry selection factors:

Proximity: Reduce latency with regional registries
Bandwidth: Private registries in same VPC
Caching: Layer deduplication across images
Security: Vulnerability scanning, access control

Demo: Building an ML Inference Container - Complete Walkthrough

Step 1: Project Structure

# Create project directory
mkdir ml-inference && cd ml-inference

# Create directory structure
mkdir -p src model data

# Project layout:
ml-inference/
├── Dockerfile
├── requirements.txt
├── model/
│   └── model.pkl         # Pretrained model
├── src/
│   ├── app.py            # FastAPI application
│   ├── inference.py      # Model inference logic
│   └── preprocess.py     # Data preprocessing
└── data/
    └── sample.json       # Test data

Step 2: Application Code

# src/app.py - FastAPI inference server
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import pickle
import numpy as np
from typing import List
import time
import logging

app = FastAPI(title="ML Inference API")
logger = logging.getLogger(__name__)

# Load model at startup
with open('/app/model/model.pkl', 'rb') as f:
    model = pickle.load(f)
    logger.info(f"Model loaded: {type(model)}")

class PredictionRequest(BaseModel):
    features: List[float]
    
class PredictionResponse(BaseModel):
    prediction: float
    confidence: float
    inference_time_ms: float

@app.get("/health")
def health_check():
    """Kubernetes/Docker health check endpoint"""
    return {"status": "healthy", "model_loaded": model is not None}

@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
    """Main prediction endpoint"""
    start_time = time.time()
    
    try:
        # Preprocess
        features = np.array(request.features).reshape(1, -1)
        
        # Predict
        prediction = model.predict(features)[0]
        confidence = model.predict_proba(features).max()
        
        # Measure time
        inference_time = (time.time() - start_time) * 1000
        
        return PredictionResponse(
            prediction=float(prediction),
            confidence=float(confidence),
            inference_time_ms=inference_time
        )
    except Exception as e:
        logger.error(f"Prediction failed: {e}")
        raise HTTPException(status_code=500, detail=str(e))

Step 3: Dockerfile - Layer by Layer

# Multi-stage build for smaller final image
# Stage 1: Builder
FROM python:3.11-slim as builder

# Install build dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Copy and install requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Stage 2: Runtime
FROM python:3.11-slim

# Create non-root user
RUN useradd -m -u 1000 mluser

# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Set working directory
WORKDIR /app

# Copy application code
COPY --chown=mluser:mluser src/ ./src/
COPY --chown=mluser:mluser model/ ./model/

# Switch to non-root user
USER mluser

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
    CMD python -c "import requests; r = requests.get('http://localhost:8080/health'); r.raise_for_status()"

# Expose port
EXPOSE 8080

# Run application
CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8080"]

Step 4: Building the Image

# Build command with explanations
DOCKER_BUILDKIT=1 docker build \
  --cache-from python:3.11-slim \      # Use cached base
  --build-arg BUILDKIT_INLINE_CACHE=1 \ # Enable inline cache
  --progress=plain \                    # Show detailed output
  -t ml-inference:v1.0 \                # Tag the image
  .                                     # Build context (current dir)

# Build output:
# => [internal] load build definition from Dockerfile
# => [internal] load .dockerignore
# => [internal] load metadata for docker.io/library/python:3.11-slim
# => [builder 1/5] FROM docker.io/library/python:3.11-slim
# => [builder 2/5] RUN apt-get update && apt-get install -y gcc g++
# => [builder 3/5] RUN python -m venv /opt/venv
# => [builder 4/5] COPY requirements.txt .
# => [builder 5/5] RUN pip install --no-cache-dir -r requirements.txt
# => [stage-1 1/6] COPY --from=builder /opt/venv /opt/venv
# => exporting to image
# => naming to docker.io/library/ml-inference:v1.0

Step 5: Running the Container

# Development mode - with live code reload
docker run -it --rm \
  --name ml-dev \
  -p 8080:8080 \
  -v $(pwd)/src:/app/src:ro \
  -v $(pwd)/model:/app/model:ro \
  -e LOG_LEVEL=DEBUG \
  ml-inference:v1.0

# Production mode - with resource limits
docker run -d \
  --name ml-prod \
  --restart unless-stopped \
  --memory="2g" \
  --memory-reservation="1g" \
  --cpus="1.5" \
  --pids-limit 100 \
  -p 8080:8080 \
  --read-only \
  --tmpfs /tmp \
  ml-inference:v1.0

Step 6: Testing the Container

# Check if container is running
docker ps
# CONTAINER ID   IMAGE              STATUS         PORTS
# abc123def456   ml-inference:v1.0  Up 2 minutes   0.0.0.0:8080->8080/tcp

# Check health endpoint
curl http://localhost:8080/health
# {"status":"healthy","model_loaded":true}

# Test prediction
curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"features": [5.1, 3.5, 1.4, 0.2]}'
# {"prediction":0,"confidence":0.98,"inference_time_ms":12.5}

# View logs
docker logs ml-prod --tail 50
# INFO:     Started server process [1]
# INFO:     Waiting for application startup.
# INFO:     Model loaded: <class 'sklearn.ensemble._forest.RandomForestClassifier'>
# INFO:     Application startup complete.

# Monitor resource usage
docker stats ml-prod --no-stream
# CONTAINER     CPU %     MEM USAGE / LIMIT   
# ml-prod       15.2%     523MiB / 2GiB

Step 7: Debugging When Things Go Wrong

# Container won't start? Check logs
docker logs ml-prod
# Error: Model file not found at /app/model/model.pkl

# Need to debug inside container?
docker exec -it ml-prod /bin/bash
mluser@abc123:/app$ ls -la
mluser@abc123:/app$ python -c "import pickle; print('Pickle works')"
mluser@abc123:/app$ exit

# Container crashes immediately?
docker run -it --entrypoint /bin/bash ml-inference:v1.0
# Now you're in the container with a shell

# Check what files made it into the image
docker run --rm ml-inference:v1.0 find /app -type f

# Inspect image layers and sizes
docker history ml-inference:v1.0
# IMAGE          CREATED       SIZE      COMMAND
# abc123         2 hours ago   2.1MB     COPY src/ ./src/
# def456         2 hours ago   125MB     COPY model/ ./model/
# ...

Step 8: Optimizing the Image

Optimization techniques used:

Multi-stage builds eliminate build dependencies
Non-root user improves security
Virtual environment isolation
Layer caching for faster rebuilds
Health checks for orchestration

Container Orchestration Preview

Why orchestration is necessary:

Kubernetes: Container Orchestration

Docker Compose: Multi-Container Applications

Docker Compose manages multi-container applications on a single host:

Define services in YAML
Automatic network creation
Service discovery by name
Coordinated lifecycle management

docker-compose.yaml Structure

version: '3.8'

services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./html:/usr/share/nginx/html:ro
    depends_on:
      - api
    networks:
      - frontend

  api:
    build: ./api
    environment:
      DATABASE_URL: postgresql://user:pass@db:5432/myapp
      REDIS_URL: redis://cache:6379
    depends_on:
      - db
      - cache
    networks:
      - frontend
      - backend
    restart: unless-stopped

  db:
    image: postgres:14
    environment:
      POSTGRES_PASSWORD: secretpass
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - backend

  cache:
    image: redis:7-alpine
    networks:
      - backend

volumes:
  postgres_data:

networks:
  frontend:
  backend:

JSON → YAML: Why Configuration Changed

YAML = YAML Ain’t Markup Language

Indentation matters (like Python)
No brackets, quotes optional
Lists with -, dictionaries with :
Comments with #

YAML Essentials for DevOps

Common YAML Gotchas: - version: 1.10 becomes float 1.1 (use quotes: "1.10") - Tabs are forbidden (spaces only) - : in values needs quotes: description: "Error: failed" - yes, no, on, off are booleans (use quotes for strings)

Docker Compose Commands

# Start all services
docker-compose up              # Foreground with logs
docker-compose up -d           # Background (detached)

# Manage services
docker-compose ps              # List running services
docker-compose logs api        # View logs for service
docker-compose stop            # Stop all services
docker-compose down            # Stop and remove containers
docker-compose down -v         # Also remove volumes

# Scaling and updates
docker-compose up -d --scale worker=3   # Run 3 worker instances
docker-compose pull                     # Update images
docker-compose build                    # Rebuild custom images
docker-compose restart api              # Restart single service

# Development workflow
docker-compose exec api bash           # Shell into running service
docker-compose run api pytest          # Run command in new container

Compose vs Manual Docker

Compose for Development vs Production

Development (docker-compose.override.yml):

services:
  api:
    build: .
    volumes:
      - ./src:/app/src  # Live code reload
    ports:
      - "5678:5678"     # Debugger port
    environment:
      - DEBUG=true

Production (docker-compose.prod.yml):

services:
  api:
    image: registry.com/api:v1.2.3
    deploy:
      resources:
        limits:
          memory: 512M
      restart_policy:
        condition: on-failure

Compose Limitations → Need for Kubernetes

Docker Compose works well until it doesn’t. The transition to Kubernetes typically happens when:

Your single host fails at 3am and takes everything down
You realize you’re manually SSH’ing to update containers
Your docker-compose.yaml becomes 500+ lines of workarounds
You need production features Compose can’t provide (real load balancing, gradual rollouts, automatic failover)

The Orchestration Problem

Orchestration solves:

Scheduling: Optimal container placement
Scaling: Respond to load changes
Recovery: Automatic failure handling
Updates: Zero-downtime deployments

Kubernetes Architecture

Control Plane (Master nodes):

etcd: Distributed key-value store for cluster state
API Server: REST API for all operations
Scheduler: Assigns pods to nodes
Controller Manager: Maintains desired state

Data Plane (Worker nodes):

kubelet: Node agent, manages pods
kube-proxy: Network proxy, service abstraction
Container Runtime: Docker/containerd

Kubernetes Objects: Pods

apiVersion: v1
kind: Pod
metadata:
  name: ml-inference
  labels:
    app: inference
    version: v2
spec:
  containers:
  - name: model-server
    image: ml-inference:v2.1
    resources:
      requests:
        memory: "2Gi"
        cpu: "1"
      limits:
        memory: "4Gi"
        cpu: "2"
    ports:
    - containerPort: 8080
  - name: metrics-collector
    image: prometheus-exporter:latest
    ports:
    - containerPort: 9090

Pod characteristics:

Smallest deployable unit
One or more containers (shared network/storage)
Ephemeral by design
Single IP address per pod

Workload Controllers

Controller types:

Deployment: Stateless applications, rolling updates
StatefulSet: Stateful applications, ordered deployment
DaemonSet: One pod per node (monitoring, logging)
Job/CronJob: Batch processing, scheduled tasks

Service Discovery and Load Balancing

apiVersion: v1
kind: Service
metadata:
  name: inference-service
spec:
  selector:
    app: inference
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: inference-internal
spec:
  selector:
    app: inference
  type: ClusterIP
  ports:
  - port: 8080

Service types:

ClusterIP: Internal cluster communication
NodePort: External access via node ports
LoadBalancer: Cloud provider load balancer
ExternalName: DNS CNAME redirect

ConfigMaps and Secrets

# ConfigMap for application configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: model-config
data:
  model_path: "/models/latest"
  batch_size: "32"
  num_workers: "4"
---
# Secret for sensitive data
apiVersion: v1
kind: Secret
metadata:
  name: api-keys
type: Opaque
data:
  aws_access_key: <base64_encoded>
  database_password: <base64_encoded>

Usage in pods:

containers:
- name: app
  envFrom:
  - configMapRef:
      name: model-config
  - secretRef:
      name: api-keys
  volumeMounts:
  - name: config
    mountPath: /etc/config
volumes:
- name: config
  configMap:
    name: model-config

Persistent Storage

Storage concepts:

PersistentVolume (PV): Cluster storage resource
PersistentVolumeClaim (PVC): Request for storage
StorageClass: Dynamic provisioning template
Volume Snapshots: Point-in-time backups

Resource Management and QoS

QoS determination:

Guaranteed: requests = limits for all resources
Burstable: requests < limits for any resource
BestEffort: No requests or limits specified

Scheduling and Affinity

apiVersion: v1
kind: Pod
metadata:
  name: gpu-training
spec:
  nodeSelector:
    gpu: "true"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node.kubernetes.io/instance-type
            operator: In
            values: ["p3.2xlarge", "p3.8xlarge"]
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values: ["gpu-training"]
          topologyKey: kubernetes.io/hostname

Scheduling constraints:

NodeSelector: Simple key-value matching
Node Affinity: Complex node selection rules
Pod Affinity: Co-locate related pods
Pod Anti-Affinity: Spread pods across nodes
Taints/Tolerations: Dedicated nodes

Rolling Updates and Rollbacks

Update strategies:

Recreate: Stop all, start all (downtime)
Rolling: Gradual replacement (zero downtime)
Blue-Green: Full environment swap
Canary: Gradual traffic shift

Monitoring and Observability

# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: inference-metrics
spec:
  selector:
    matchLabels:
      app: inference
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Key metrics:

Golden Signals: Latency, traffic, errors, saturation
Resource Metrics: CPU, memory, network, disk
Application Metrics: Request rate, queue depth, cache hits
Custom Metrics: Model accuracy, inference time

Observability stack:

Prometheus: Metrics collection
Grafana: Visualization
Fluentd: Log aggregation
Jaeger: Distributed tracing

Kubernetes Operators

Operators extend Kubernetes:

Custom Resource Definitions (CRDs)
Domain-specific knowledge encoded
Automated operational tasks
Self-healing capabilities

Demo: Deploying ML Pipeline on Kubernetes

# Create namespace
kubectl create namespace ml-pipeline

# Deploy training job
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: model-training
  namespace: ml-pipeline
spec:
  template:
    spec:
      containers:
      - name: trainer
        image: ml-training:latest
        resources:
          limits:
            nvidia.com/gpu: 1
      restartPolicy: OnFailure
  backoffLimit: 3
EOF

# Deploy inference service
kubectl apply -f inference-deployment.yaml
kubectl apply -f inference-service.yaml

# Setup horizontal autoscaling
kubectl autoscale deployment inference \
  --cpu-percent=70 \
  --min=2 \
  --max=10

# Check status
kubectl get pods -n ml-pipeline
kubectl top pods -n ml-pipeline

Production considerations:

Resource quotas per namespace
Network policies for security
Pod disruption budgets
Cluster autoscaling for nodes

Networking Fundamentals for Cloud Computing

Physical vs Virtual Networks

Key distinction:

Physical: Hardware switches, routers, cables
Virtual: Software-defined networking in hypervisor/kernel
Virtual networks run on top of physical infrastructure

The OSI Model in Practice

Focus areas for virtualization:

Layer 2: Virtual switches, VLANs, MAC addresses
Layer 3: IP routing, subnets, NAT
Layer 4: Port mapping, load balancing

IP Addressing Fundamentals

CIDR (Classless Inter-Domain Routing):

/24 = 255.255.255.0 = 256 addresses (254 usable)
/16 = 255.255.0.0 = 65,536 addresses
Smaller number = larger network

Network Interfaces and Virtual NICs

Interface types in virtualization:

eth0: Physical network interface
docker0: Docker bridge interface
veth: Virtual ethernet pairs (containers)
tap/tun: VM network interfaces

Packet Flow Through the Stack

Each layer adds its header:

Application generates data
TCP adds ports and sequencing
IP adds addressing and routing
Ethernet adds MAC addresses for local delivery

Bridge Networking in Containers

Bridge networking provides:

Isolated network namespace per container
Automatic IP assignment from subnet
NAT for external connectivity
Inter-container communication

VLANs and Network Segmentation

VLAN benefits:

Logical network separation on same physical infrastructure
Broadcast domain isolation
Security through segmentation
Simplified network management

Routing Between Networks

Routing fundamentals:

Each packet examined for destination IP
Longest prefix match wins
Metrics determine best path
Default route catches all unmatched traffic

TCP Connection Lifecycle

TCP provides:

Reliable, ordered delivery
Flow control (sliding window)
Congestion control
Connection state management

UDP vs TCP for Cloud Applications

Protocol selection criteria:

TCP: When reliability matters more than speed
UDP: When speed matters more than reliability
Modern alternatives: QUIC (UDP-based, reliable)

Network Namespaces and Isolation

Network namespaces provide:

Complete network stack isolation
Independent routing tables
Separate firewall rules
Private loopback interface

Load Balancing Strategies

Load balancing considerations:

Algorithm: Round-robin, least connections, IP hash
Health checks: HTTP/TCP probes
Session affinity: Sticky sessions when needed
Geographic distribution: Latency-based routing

Service Mesh and Microservices Networking

Service mesh provides:

Transparent proxying via sidecars
Observability without code changes
Traffic management policies
Security (mTLS) between services

Network Performance Optimization

Optimization techniques:

SR-IOV: Hardware virtualization for direct device access
DPDK: Bypass kernel for packet processing
Jumbo frames: Reduce packet overhead
TCP tuning: Buffer sizes, congestion algorithms