Containers and Orchestration

EE 547 - Unit 2

Dr. Brandon Franzke

Fall 2025

Virtualization Fundamentals

The Resource Utilization Problem

Cost of Underutilization

The Business Case

10% improvement in utilization = Billions in savings

The Isolation Requirement

Virtualization: Core Concept

Key Components

Resource Multiplexing and Isolation

CPU Virtualization: Trap and Emulate

Memory Virtualization Challenge

Shadow Page Tables vs EPT

I/O Virtualization Approaches

Hardware Virtualization Support

Intel VT-x and AMD-V

The Game Changer (2005-2006)

Before hardware support:

  • Binary translation required
  • Complex hypervisor code
  • Performance penalties

After hardware support:

  • Native virtualization
  • Simplified hypervisors
  • Near-native performance

Key Features

VMX Operations (Intel terminology)

  • VMXON: Enable virtualization
  • VMLAUNCH: Start VM
  • VMRESUME: Continue VM
  • VMEXIT: Return to hypervisor

VMCS/VMCB - Virtual Machine Control Structure

  • Stores complete VM state
  • Controls VM behavior
  • Hardware-managed context switching

Type 1 vs Type 2 Hypervisors

Resource Scheduling in Hypervisors

CPU Scheduling Algorithms

Credit Scheduler (Xen)

Each VM allocated credits based on weight
VM consumes credits when running
Depleted credits → lower priority
Credits refresh periodically

CFS (KVM - Completely Fair Scheduler)

Equal CPU time by default
Configurable weights/shares
Nice values for priority
Real-time scheduling available

Memory Management Techniques

Ballooning: Cooperative memory reclamation

Page Sharing: Deduplicate identical pages

Compression: Compress inactive pages

Swapping: Last resort - disk backing

VM Lifecycle States

Virtual Hardware Configuration

What Constitutes a VM?

Virtual CPU Configuration

<vcpu placement='static'>4</vcpu>
<cpu mode='host-passthrough'>
  <topology sockets='1' cores='4' threads='1'/>
</cpu>

Virtual Memory

<memory unit='GiB'>16</memory>
<currentMemory unit='GiB'>16</currentMemory>

Virtual Storage

<disk type='file' device='disk'>
  <driver name='qemu' type='qcow2' cache='writeback'/>
  <source file='/var/lib/vms/ubuntu.qcow2'/>
  <target dev='vda' bus='virtio'/>
</disk>

Virtual Network

<interface type='bridge'>
  <mac address='52:54:00:6b:3c:58'/>
  <source bridge='br0'/>
  <model type='virtio'/>
</interface>

Boot Process in a Virtual Machine

Step-by-Step Boot Sequence

  1. VM Creation

    Hypervisor allocates:
    - Memory regions
    - Virtual CPU structures
    - Device emulation threads
  2. Firmware Initialization

    Virtual BIOS/UEFI:
    - Memory detection (fake)
    - Device enumeration (virtual)
    - Boot device selection
  3. Bootloader

    GRUB/Windows Boot Manager:
    - Reads virtual disk
    - Loads kernel into memory
    - Sets up initial ramdisk
  4. Operating System

    Kernel initialization:
    - Detects "hardware" (all virtual)
    - Loads drivers
    - Starts init process

Guest Perspective vs Reality

Inside the VM

$ lscpu
Architecture:          x86_64
CPU(s):                4
Model name:            Intel Xeon E5-2680 v4
CPU MHz:               2400.000
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              35840K

$ free -h
              total        used        free
Mem:           16Gi       2.1Gi        12Gi

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       100G   45G   55G  45% /

On the Host

$ ps aux | grep qemu
qemu-system-x86_64 -enable-kvm -m 16384 -smp 4 \
  -drive file=vm.qcow2 -netdev tap,id=net0

$ top
PID    USER  PR  VIRT    RES    SHR  %CPU  %MEM  COMMAND
15234  qemu  20  17.2g   16.1g  4.2g  385   12.5  qemu-system-x86

The VM is just another process!

Example: Creating a VM

Using QEMU/KVM

# 1. Create disk image
qemu-img create -f qcow2 ubuntu-vm.qcow2 50G

# 2. Start VM with Ubuntu installer
qemu-system-x86_64 \
  -enable-kvm \                    # Use hardware acceleration
  -cpu host \                       # Pass through CPU features
  -m 8192 \                        # 8GB RAM
  -smp cores=4 \                   # 4 CPU cores
  -drive file=ubuntu-vm.qcow2,if=virtio \
  -cdrom ubuntu-22.04.iso \
  -boot d \                        # Boot from CD
  -vga qxl \
  -spice port=5900,addr=127.0.0.1

# 3. Connect to console
remote-viewer spice://localhost:5900

# 4. After installation, normal boot
qemu-system-x86_64 \
  -enable-kvm \
  -cpu host \
  -m 8192 \
  -smp cores=4 \
  -drive file=ubuntu-vm.qcow2,if=virtio \
  -netdev user,id=net0,hostfwd=tcp::2222-:22 \
  -device virtio-net,netdev=net0

Performance Overhead Analysis

CPU and Memory Performance

CPU Performance Factors

VM Exits - The Main Culprit

  • Each privileged instruction causes exit
  • Exit cost: 1000-4000 cycles
  • Mitigation: Hardware assists reduce exits

Cache Effects

  • VM switches flush TLB
  • Cache pollution from hypervisor
  • NUMA complications

Memory Performance

Translation Overhead

  • Two-level page walks
  • TLB pressure increased
  • Larger page tables

Measured Impact

STREAM Benchmark (Memory Bandwidth):
Native:        95 GB/s
VM with EPT:   88 GB/s  (7% loss)
VM without:    68 GB/s  (28% loss)

I/O Performance Penalties

GPU Virtualization for ML

Summary: When to Use VMs

Ideal Use Cases

Strong Isolation Required

  • Multi-tenant environments
  • Security-sensitive workloads
  • Compliance requirements

Different Operating Systems

  • Windows and Linux on same host
  • Legacy application support

Resource Guarantees

  • Dedicated CPU/memory allocation
  • Predictable performance

Limitations for ML/Data Science

Performance Overhead

  • 5-20% depending on workload
  • GPU virtualization challenges

Resource Efficiency

  • GB of overhead per instance
  • Slow startup times

Next: How containers solve these problems…

Container Architecture & Internals

The Problem Containers Solve

What Is a Kernel?

The kernel is the core of the operating system that:

  • Manages hardware: Only the kernel can directly touch CPU, RAM, disk, network cards
  • Enforces isolation: Processes can’t see each other’s memory
  • Provides abstraction: Files, sockets, processes are kernel concepts
  • Schedules work: Decides which process runs on which CPU core

Kernel Privilege Levels

Why can’t applications access hardware directly?

  • CPU enforces privilege levels in hardware
  • Ring 3 code physically cannot execute privileged instructions
  • Attempting privileged operations causes CPU trap to kernel
  • Kernel decides whether to allow or deny the operation

Containers vs VMs:

  • Containers: Share host kernel (same Ring 0)
  • VMs: Each has own kernel (separate Ring 0)

Why Not Just Use VMs?

What Is a Container Really?

Demonstration: Container = Process

See it yourself

# Terminal 1: Start a container
$ docker run -d --name demo nginx
a5f3c8b9d2e1

# Terminal 2: Find it as a process
$ ps aux | grep nginx
root     15234  0.0  0.1  141836  2308 ?  Ss  10:42  nginx: master
www-data 15235  0.0  0.1  142268  3544 ?  S   10:42  nginx: worker

# It's just a process!
$ pstree -p 15234
nginx(15234)───nginx(15235)

# Check its namespaces
$ ls -la /proc/15234/ns/
lrwxrwxrwx 1 root root ipc:[4026532439]    # Isolated IPC
lrwxrwxrwx 1 root root mnt:[4026532437]    # Isolated mounts
lrwxrwxrwx 1 root root net:[4026532442]    # Isolated network
lrwxrwxrwx 1 root root pid:[4026532440]    # Isolated PIDs
lrwxrwxrwx 1 root root uts:[4026532438]    # Isolated hostname

# Kill it like any process
$ kill 15234
$ docker ps -a
CONTAINER ID   STATUS      
a5f3c8b9d2e1   Exited (137)

Shared Kernel Architecture

Linux Namespaces: The Isolation Mechanism

Namespace Types in Detail

Six Types of Isolation

Namespace Isolates Year Example
Mount Filesystem mount points 2002 Container sees / as its image
UTS Hostname and domain name 2006 Container has own hostname
IPC Inter-process communication 2006 Separate shared memory
PID Process IDs 2008 Container PID 1 is init
Network Network devices, stacks, ports 2009 Own IP address, ports
User User and group IDs 2013 Root in container ≠ host root

Creating a Namespace

// System call to create namespaces
clone(CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS, ...)

// Or using unshare
unshare(CLONE_NEWPID)

PID Namespace in Action

Mount Namespace: Filesystem Isolation

Network Namespace

Control Groups (cgroups)

Putting It Together: Namespace + cgroups

# Create a simple container manually

# 1. Create namespaces
unshare --pid --mount --net --uts --ipc --fork bash

# 2. Set hostname (UTS namespace)
hostname my-container

# 3. Mount proc (PID namespace)
mount -t proc proc /proc

# 4. Configure network (Network namespace)
ip link set lo up
ip addr add 172.17.0.2/16 dev eth0

# 5. Set resource limits (cgroups)
echo $$ > /sys/fs/cgroup/memory/docker/container1/cgroup.procs
echo 4294967296 > /sys/fs/cgroup/memory/docker/container1/memory.limit_in_bytes

# You've created a container!

This is exactly what Docker does

  • Plus image management
  • Plus network setup
  • Plus storage layers
  • Plus convenience

Container Images: Structure

Layer Efficiency

Union Filesystems

How Layers Become One Filesystem

OverlayFS - The modern standard

Layer 1: Ubuntu base     [Read-only]
         /bin/bash
         /lib/libc.so
         
Layer 2: Python install  [Read-only]
         /usr/bin/python3
         /usr/lib/python3.8/
         
Layer 3: App code       [Read-only]
         /app/main.py
         /app/config.yaml
         
Container Layer:        [Read-write]
         (empty initially)
         
Union View:            [What container sees]
         /bin/bash         (from layer 1)
         /usr/bin/python3  (from layer 2)
         /app/main.py      (from layer 3)

Copy-on-Write (CoW)

When container modifies a file: 1. File copied from read-only layer 2. Copy placed in read-write layer 3. Original remains unchanged

Build Cache Demonstration

From Image to Running Container

The Journey

  1. Pull/Build Image

    • Download layers (if needed)
    • Store in /var/lib/docker
  2. Create Container

    docker create ubuntu:20.04
    • Allocate container ID
    • Create filesystem layers
    • Prepare config.json
  3. Start Container

    docker start <container-id>
    • Create namespaces
    • Setup cgroups
    • Mount layers
    • Configure network
    • Execute entrypoint
  4. Container Running

    • Process isolated
    • Resources limited
    • Network connected

Container Runtime Architecture

What Happens During docker run

Complete Sequence

$ docker run -d -p 8080:80 --name web nginx
  1. Client → Daemon

    • Parse command
    • Send API request
  2. Image Check

    • Look for nginx:latest locally
    • Pull if missing
  3. Create Container

    • Generate container ID
    • Create filesystem snapshot
    • Write config.json
  4. Setup Namespaces

    • Create PID namespace
    • Create network namespace
    • Create mount namespace
  5. Configure cgroups

    • Set memory limits
    • Set CPU shares
  6. Network Setup

    • Create veth pair
    • Connect to bridge
    • Configure iptables
  7. Start Process

    • Execute nginx
    • As PID 1 in container

Container Security Model

Why Containers Aren’t VMs

Fundamental Differences

Shared Kernel

  • All containers run on same kernel
  • Kernel bug = all containers vulnerable
  • No kernel customization per container

Process-Level Isolation

  • Containers are processes
  • Can be killed like any process
  • Share system resources

Security Boundaries

  • VMs: Hardware-enforced
  • Containers: Kernel-enforced
  • Escape easier in containers

When This Matters

Use VMs when:

  • Running untrusted code
  • Need different kernels
  • Require complete isolation
  • Compliance requirements

Use Containers when:

  • Deploying trusted applications
  • Need fast startup
  • Want efficient resource use
  • Building microservices

Docker: From Theory to Practice

Images vs Containers: The Critical Distinction

Think of it like this: - Image = Python class definition (blueprint) - Container = Instance of that class (running object) - Images are immutable; containers have writable layer - docker images lists images; docker ps lists containers

Where Do Images Come From?

Official vs User Images:

  • python → Official Python image
  • username/myapp → User’s image
  • gcr.io/project/image → Google Container Registry
  • Always specify tags; avoid latest in production

Essential Docker Commands

# Image Management
docker pull python:3.11              # Download image from registry
docker images                         # List local images
docker rmi python:3.11               # Remove image
docker build -t myapp:v1 .          # Build image from Dockerfile

# Container Lifecycle
docker run python:3.11 python -c "print('Hello')"  # Run and exit
docker run -d nginx                  # Run in background (detached)
docker run -it ubuntu bash          # Interactive terminal
docker ps                            # List running containers
docker ps -a                         # List all containers
docker stop container_id            # Stop gracefully
docker kill container_id            # Force stop
docker rm container_id               # Remove stopped container

# Debugging and Inspection
docker logs container_id            # View output
docker exec -it container_id bash   # Enter running container
docker inspect container_id         # Full container details
docker stats                        # Resource usage

Docker Run: The Swiss Army Knife

Docker Architecture

  • Docker daemon manages container lifecycle
  • containerd handles container runtime
  • runc creates and runs containers (OCI compliant)

Image Layers and Build Cache

Layer ordering impacts build performance:

  • Base images change rarely → top layers
  • Application code changes frequently → bottom layers
  • Dependencies in middle based on change frequency

Dockerfile Best Practices for ML

# Multi-stage build for production
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim
WORKDIR /app

# Copy only installed packages
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH

# Copy application last for cache efficiency
COPY model/ ./model/
COPY src/ ./src/
CMD ["python", "src/inference.py"]

Key principles:

  • Multi-stage builds reduce final image size
  • Copy requirements before code (cache efficiency)
  • Use specific base image versions
  • Minimize layers in production stage

Container Resource Limits

Resource limits prevent:

  • Memory leaks affecting other containers
  • CPU monopolization
  • Network bandwidth saturation

Docker Networking Modes

Bridge (default)

docker run --network bridge app
  • Isolated network namespace
  • NAT for external access
  • Inter-container communication via bridge

Host

docker run --network host app
  • No network isolation
  • Direct host network access
  • Best performance, less secure

None

docker run --network none app
  • Complete network isolation
  • No network interfaces
  • Security-critical workloads

Custom Bridge

docker network create ml-net
docker run --network ml-net app
  • User-defined networks
  • Automatic DNS resolution
  • Network segmentation

Volume Management for ML Workloads

Volume selection criteria:

  • Performance: tmpfs > bind mount > named volume > network
  • Persistence: named volume, bind mount for stateful data
  • Sharing: NFS for multi-node access

Container Security Boundaries

# Run as non-root user
docker run --user 1000:1000 app

# Read-only root filesystem
docker run --read-only \
  --tmpfs /tmp \
  --tmpfs /var/run \
  app

# Drop capabilities
docker run --cap-drop ALL \
  --cap-add NET_BIND_SERVICE \
  app

# Security options
docker run --security-opt no-new-privileges \
  --security-opt apparmor=docker-default \
  app

Security layers:

  • User namespaces map container users to host UIDs
  • Capability dropping reduces attack surface
  • Read-only filesystems prevent persistence
  • AppArmor/SELinux provide mandatory access control

Production Patterns: Health Checks

# Dockerfile with health check
FROM python:3.11-slim

# Health check configuration
HEALTHCHECK --interval=30s \
  --timeout=3s \
  --start-period=60s \
  --retries=3 \
  CMD python -c "import requests; requests.get('http://localhost:8080/health')"

# Application setup
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt

EXPOSE 8080
CMD ["python", "app.py"]

Health check types:

  • Liveness: Is the container running?
  • Readiness: Can the container accept traffic?
  • Startup: Has initialization completed?

Docker daemon actions on failure:

  • Restart container (restart policy dependent)
  • Remove from load balancer
  • Alert monitoring system

Debugging Containers

# Inspect running processes
docker exec container_id ps aux

# Attach to running container
docker exec -it container_id /bin/bash

# View container logs
docker logs --follow --tail 100 container_id

# Inspect container metadata
docker inspect container_id | jq '.State'

# Copy files from container
docker cp container_id:/app/logs/error.log ./

# Monitor resource usage
docker stats container_id

Advanced debugging:

# Network debugging
docker exec container_id netstat -tulpn
docker exec container_id tcpdump -i eth0

# Process tracing
docker exec container_id strace -p 1

# File system analysis
docker exec container_id df -h
docker exec container_id lsof

Container Registries and Distribution

Registry selection factors:

  • Proximity: Reduce latency with regional registries
  • Bandwidth: Private registries in same VPC
  • Caching: Layer deduplication across images
  • Security: Vulnerability scanning, access control

Demo: Building an ML Inference Container - Complete Walkthrough

Step 1: Project Structure

# Create project directory
mkdir ml-inference && cd ml-inference

# Create directory structure
mkdir -p src model data

# Project layout:
ml-inference/
├── Dockerfile
├── requirements.txt
├── model/
   └── model.pkl         # Pretrained model
├── src/
   ├── app.py            # FastAPI application
   ├── inference.py      # Model inference logic
   └── preprocess.py     # Data preprocessing
└── data/
    └── sample.json       # Test data

Step 2: Application Code

# src/app.py - FastAPI inference server
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import pickle
import numpy as np
from typing import List
import time
import logging

app = FastAPI(title="ML Inference API")
logger = logging.getLogger(__name__)

# Load model at startup
with open('/app/model/model.pkl', 'rb') as f:
    model = pickle.load(f)
    logger.info(f"Model loaded: {type(model)}")

class PredictionRequest(BaseModel):
    features: List[float]
    
class PredictionResponse(BaseModel):
    prediction: float
    confidence: float
    inference_time_ms: float

@app.get("/health")
def health_check():
    """Kubernetes/Docker health check endpoint"""
    return {"status": "healthy", "model_loaded": model is not None}

@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
    """Main prediction endpoint"""
    start_time = time.time()
    
    try:
        # Preprocess
        features = np.array(request.features).reshape(1, -1)
        
        # Predict
        prediction = model.predict(features)[0]
        confidence = model.predict_proba(features).max()
        
        # Measure time
        inference_time = (time.time() - start_time) * 1000
        
        return PredictionResponse(
            prediction=float(prediction),
            confidence=float(confidence),
            inference_time_ms=inference_time
        )
    except Exception as e:
        logger.error(f"Prediction failed: {e}")
        raise HTTPException(status_code=500, detail=str(e))

Step 3: Dockerfile - Layer by Layer

# Multi-stage build for smaller final image
# Stage 1: Builder
FROM python:3.11-slim as builder

# Install build dependencies
RUN apt-get update && apt-get install -y \
    gcc \
    g++ \
    && rm -rf /var/lib/apt/lists/*

# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Copy and install requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Stage 2: Runtime
FROM python:3.11-slim

# Create non-root user
RUN useradd -m -u 1000 mluser

# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Set working directory
WORKDIR /app

# Copy application code
COPY --chown=mluser:mluser src/ ./src/
COPY --chown=mluser:mluser model/ ./model/

# Switch to non-root user
USER mluser

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
    CMD python -c "import requests; r = requests.get('http://localhost:8080/health'); r.raise_for_status()"

# Expose port
EXPOSE 8080

# Run application
CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8080"]

Step 4: Building the Image

# Build command with explanations
DOCKER_BUILDKIT=1 docker build \
  --cache-from python:3.11-slim \      # Use cached base
  --build-arg BUILDKIT_INLINE_CACHE=1 \ # Enable inline cache
  --progress=plain \                    # Show detailed output
  -t ml-inference:v1.0 \                # Tag the image
  .                                     # Build context (current dir)

# Build output:
# => [internal] load build definition from Dockerfile
# => [internal] load .dockerignore
# => [internal] load metadata for docker.io/library/python:3.11-slim
# => [builder 1/5] FROM docker.io/library/python:3.11-slim
# => [builder 2/5] RUN apt-get update && apt-get install -y gcc g++
# => [builder 3/5] RUN python -m venv /opt/venv
# => [builder 4/5] COPY requirements.txt .
# => [builder 5/5] RUN pip install --no-cache-dir -r requirements.txt
# => [stage-1 1/6] COPY --from=builder /opt/venv /opt/venv
# => exporting to image
# => naming to docker.io/library/ml-inference:v1.0

Step 5: Running the Container

# Development mode - with live code reload
docker run -it --rm \
  --name ml-dev \
  -p 8080:8080 \
  -v $(pwd)/src:/app/src:ro \
  -v $(pwd)/model:/app/model:ro \
  -e LOG_LEVEL=DEBUG \
  ml-inference:v1.0

# Production mode - with resource limits
docker run -d \
  --name ml-prod \
  --restart unless-stopped \
  --memory="2g" \
  --memory-reservation="1g" \
  --cpus="1.5" \
  --pids-limit 100 \
  -p 8080:8080 \
  --read-only \
  --tmpfs /tmp \
  ml-inference:v1.0

Step 6: Testing the Container

# Check if container is running
docker ps
# CONTAINER ID   IMAGE              STATUS         PORTS
# abc123def456   ml-inference:v1.0  Up 2 minutes   0.0.0.0:8080->8080/tcp

# Check health endpoint
curl http://localhost:8080/health
# {"status":"healthy","model_loaded":true}

# Test prediction
curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"features": [5.1, 3.5, 1.4, 0.2]}'
# {"prediction":0,"confidence":0.98,"inference_time_ms":12.5}

# View logs
docker logs ml-prod --tail 50
# INFO:     Started server process [1]
# INFO:     Waiting for application startup.
# INFO:     Model loaded: <class 'sklearn.ensemble._forest.RandomForestClassifier'>
# INFO:     Application startup complete.

# Monitor resource usage
docker stats ml-prod --no-stream
# CONTAINER     CPU %     MEM USAGE / LIMIT   
# ml-prod       15.2%     523MiB / 2GiB       

Step 7: Debugging When Things Go Wrong

# Container won't start? Check logs
docker logs ml-prod
# Error: Model file not found at /app/model/model.pkl

# Need to debug inside container?
docker exec -it ml-prod /bin/bash
mluser@abc123:/app$ ls -la
mluser@abc123:/app$ python -c "import pickle; print('Pickle works')"
mluser@abc123:/app$ exit

# Container crashes immediately?
docker run -it --entrypoint /bin/bash ml-inference:v1.0
# Now you're in the container with a shell

# Check what files made it into the image
docker run --rm ml-inference:v1.0 find /app -type f

# Inspect image layers and sizes
docker history ml-inference:v1.0
# IMAGE          CREATED       SIZE      COMMAND
# abc123         2 hours ago   2.1MB     COPY src/ ./src/
# def456         2 hours ago   125MB     COPY model/ ./model/
# ...

Step 8: Optimizing the Image

Optimization techniques used:

  • Multi-stage builds eliminate build dependencies
  • Non-root user improves security
  • Virtual environment isolation
  • Layer caching for faster rebuilds
  • Health checks for orchestration

Container Orchestration Preview

Why orchestration is necessary:

Kubernetes: Container Orchestration

Docker Compose: Multi-Container Applications

Docker Compose manages multi-container applications on a single host:

  • Define services in YAML
  • Automatic network creation
  • Service discovery by name
  • Coordinated lifecycle management

docker-compose.yaml Structure

version: '3.8'

services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./html:/usr/share/nginx/html:ro
    depends_on:
      - api
    networks:
      - frontend

  api:
    build: ./api
    environment:
      DATABASE_URL: postgresql://user:pass@db:5432/myapp
      REDIS_URL: redis://cache:6379
    depends_on:
      - db
      - cache
    networks:
      - frontend
      - backend
    restart: unless-stopped

  db:
    image: postgres:14
    environment:
      POSTGRES_PASSWORD: secretpass
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - backend

  cache:
    image: redis:7-alpine
    networks:
      - backend

volumes:
  postgres_data:

networks:
  frontend:
  backend:

JSON → YAML: Why Configuration Changed

YAML = YAML Ain’t Markup Language

  • Indentation matters (like Python)
  • No brackets, quotes optional
  • Lists with -, dictionaries with :
  • Comments with #

YAML Essentials for DevOps

Common YAML Gotchas: - version: 1.10 becomes float 1.1 (use quotes: "1.10") - Tabs are forbidden (spaces only) - : in values needs quotes: description: "Error: failed" - yes, no, on, off are booleans (use quotes for strings)

Docker Compose Commands

# Start all services
docker-compose up              # Foreground with logs
docker-compose up -d           # Background (detached)

# Manage services
docker-compose ps              # List running services
docker-compose logs api        # View logs for service
docker-compose stop            # Stop all services
docker-compose down            # Stop and remove containers
docker-compose down -v         # Also remove volumes

# Scaling and updates
docker-compose up -d --scale worker=3   # Run 3 worker instances
docker-compose pull                     # Update images
docker-compose build                    # Rebuild custom images
docker-compose restart api              # Restart single service

# Development workflow
docker-compose exec api bash           # Shell into running service
docker-compose run api pytest          # Run command in new container

Compose vs Manual Docker

Compose for Development vs Production

Development (docker-compose.override.yml):

services:
  api:
    build: .
    volumes:
      - ./src:/app/src  # Live code reload
    ports:
      - "5678:5678"     # Debugger port
    environment:
      - DEBUG=true

Production (docker-compose.prod.yml):

services:
  api:
    image: registry.com/api:v1.2.3
    deploy:
      resources:
        limits:
          memory: 512M
      restart_policy:
        condition: on-failure

Compose Limitations → Need for Kubernetes

Docker Compose works well until it doesn’t. The transition to Kubernetes typically happens when:

  • Your single host fails at 3am and takes everything down
  • You realize you’re manually SSH’ing to update containers
  • Your docker-compose.yaml becomes 500+ lines of workarounds
  • You need production features Compose can’t provide (real load balancing, gradual rollouts, automatic failover)

The Orchestration Problem

Orchestration solves:

  • Scheduling: Optimal container placement
  • Scaling: Respond to load changes
  • Recovery: Automatic failure handling
  • Updates: Zero-downtime deployments

Kubernetes Architecture

Control Plane (Master nodes):

  • etcd: Distributed key-value store for cluster state
  • API Server: REST API for all operations
  • Scheduler: Assigns pods to nodes
  • Controller Manager: Maintains desired state

Data Plane (Worker nodes):

  • kubelet: Node agent, manages pods
  • kube-proxy: Network proxy, service abstraction
  • Container Runtime: Docker/containerd

Kubernetes Objects: Pods

apiVersion: v1
kind: Pod
metadata:
  name: ml-inference
  labels:
    app: inference
    version: v2
spec:
  containers:
  - name: model-server
    image: ml-inference:v2.1
    resources:
      requests:
        memory: "2Gi"
        cpu: "1"
      limits:
        memory: "4Gi"
        cpu: "2"
    ports:
    - containerPort: 8080
  - name: metrics-collector
    image: prometheus-exporter:latest
    ports:
    - containerPort: 9090

Pod characteristics:

  • Smallest deployable unit
  • One or more containers (shared network/storage)
  • Ephemeral by design
  • Single IP address per pod

Workload Controllers

Controller types:

  • Deployment: Stateless applications, rolling updates
  • StatefulSet: Stateful applications, ordered deployment
  • DaemonSet: One pod per node (monitoring, logging)
  • Job/CronJob: Batch processing, scheduled tasks

Service Discovery and Load Balancing

apiVersion: v1
kind: Service
metadata:
  name: inference-service
spec:
  selector:
    app: inference
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: inference-internal
spec:
  selector:
    app: inference
  type: ClusterIP
  ports:
  - port: 8080

Service types:

  • ClusterIP: Internal cluster communication
  • NodePort: External access via node ports
  • LoadBalancer: Cloud provider load balancer
  • ExternalName: DNS CNAME redirect

ConfigMaps and Secrets

# ConfigMap for application configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: model-config
data:
  model_path: "/models/latest"
  batch_size: "32"
  num_workers: "4"
---
# Secret for sensitive data
apiVersion: v1
kind: Secret
metadata:
  name: api-keys
type: Opaque
data:
  aws_access_key: <base64_encoded>
  database_password: <base64_encoded>

Usage in pods:

containers:
- name: app
  envFrom:
  - configMapRef:
      name: model-config
  - secretRef:
      name: api-keys
  volumeMounts:
  - name: config
    mountPath: /etc/config
volumes:
- name: config
  configMap:
    name: model-config

Persistent Storage

Storage concepts:

  • PersistentVolume (PV): Cluster storage resource
  • PersistentVolumeClaim (PVC): Request for storage
  • StorageClass: Dynamic provisioning template
  • Volume Snapshots: Point-in-time backups

Resource Management and QoS

QoS determination:

  • Guaranteed: requests = limits for all resources
  • Burstable: requests < limits for any resource
  • BestEffort: No requests or limits specified

Scheduling and Affinity

apiVersion: v1
kind: Pod
metadata:
  name: gpu-training
spec:
  nodeSelector:
    gpu: "true"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node.kubernetes.io/instance-type
            operator: In
            values: ["p3.2xlarge", "p3.8xlarge"]
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app
              operator: In
              values: ["gpu-training"]
          topologyKey: kubernetes.io/hostname

Scheduling constraints:

  • NodeSelector: Simple key-value matching
  • Node Affinity: Complex node selection rules
  • Pod Affinity: Co-locate related pods
  • Pod Anti-Affinity: Spread pods across nodes
  • Taints/Tolerations: Dedicated nodes

Rolling Updates and Rollbacks

Update strategies:

  • Recreate: Stop all, start all (downtime)
  • Rolling: Gradual replacement (zero downtime)
  • Blue-Green: Full environment swap
  • Canary: Gradual traffic shift

Monitoring and Observability

# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: inference-metrics
spec:
  selector:
    matchLabels:
      app: inference
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Key metrics:

  • Golden Signals: Latency, traffic, errors, saturation
  • Resource Metrics: CPU, memory, network, disk
  • Application Metrics: Request rate, queue depth, cache hits
  • Custom Metrics: Model accuracy, inference time

Observability stack:

  • Prometheus: Metrics collection
  • Grafana: Visualization
  • Fluentd: Log aggregation
  • Jaeger: Distributed tracing

Kubernetes Operators

Operators extend Kubernetes:

  • Custom Resource Definitions (CRDs)
  • Domain-specific knowledge encoded
  • Automated operational tasks
  • Self-healing capabilities

Demo: Deploying ML Pipeline on Kubernetes

# Create namespace
kubectl create namespace ml-pipeline

# Deploy training job
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
  name: model-training
  namespace: ml-pipeline
spec:
  template:
    spec:
      containers:
      - name: trainer
        image: ml-training:latest
        resources:
          limits:
            nvidia.com/gpu: 1
      restartPolicy: OnFailure
  backoffLimit: 3
EOF

# Deploy inference service
kubectl apply -f inference-deployment.yaml
kubectl apply -f inference-service.yaml

# Setup horizontal autoscaling
kubectl autoscale deployment inference \
  --cpu-percent=70 \
  --min=2 \
  --max=10

# Check status
kubectl get pods -n ml-pipeline
kubectl top pods -n ml-pipeline

Production considerations:

  • Resource quotas per namespace
  • Network policies for security
  • Pod disruption budgets
  • Cluster autoscaling for nodes

Networking Fundamentals for Cloud Computing

Physical vs Virtual Networks

Key distinction:

  • Physical: Hardware switches, routers, cables
  • Virtual: Software-defined networking in hypervisor/kernel
  • Virtual networks run on top of physical infrastructure

The OSI Model in Practice

Focus areas for virtualization:

  • Layer 2: Virtual switches, VLANs, MAC addresses
  • Layer 3: IP routing, subnets, NAT
  • Layer 4: Port mapping, load balancing

IP Addressing Fundamentals

CIDR (Classless Inter-Domain Routing):

  • /24 = 255.255.255.0 = 256 addresses (254 usable)
  • /16 = 255.255.0.0 = 65,536 addresses
  • Smaller number = larger network

Network Interfaces and Virtual NICs

Interface types in virtualization:

  • eth0: Physical network interface
  • docker0: Docker bridge interface
  • veth: Virtual ethernet pairs (containers)
  • tap/tun: VM network interfaces

Packet Flow Through the Stack

Each layer adds its header:

  • Application generates data
  • TCP adds ports and sequencing
  • IP adds addressing and routing
  • Ethernet adds MAC addresses for local delivery

Bridge Networking in Containers

Bridge networking provides:

  • Isolated network namespace per container
  • Automatic IP assignment from subnet
  • NAT for external connectivity
  • Inter-container communication

VLANs and Network Segmentation

VLAN benefits:

  • Logical network separation on same physical infrastructure
  • Broadcast domain isolation
  • Security through segmentation
  • Simplified network management

Routing Between Networks

Routing fundamentals:

  • Each packet examined for destination IP
  • Longest prefix match wins
  • Metrics determine best path
  • Default route catches all unmatched traffic

TCP Connection Lifecycle

TCP provides:

  • Reliable, ordered delivery
  • Flow control (sliding window)
  • Congestion control
  • Connection state management

UDP vs TCP for Cloud Applications

Protocol selection criteria:

  • TCP: When reliability matters more than speed
  • UDP: When speed matters more than reliability
  • Modern alternatives: QUIC (UDP-based, reliable)

Network Namespaces and Isolation

Network namespaces provide:

  • Complete network stack isolation
  • Independent routing tables
  • Separate firewall rules
  • Private loopback interface

Load Balancing Strategies

Load balancing considerations:

  • Algorithm: Round-robin, least connections, IP hash
  • Health checks: HTTP/TCP probes
  • Session affinity: Sticky sessions when needed
  • Geographic distribution: Latency-based routing

Service Mesh and Microservices Networking

Service mesh provides:

  • Transparent proxying via sidecars
  • Observability without code changes
  • Traffic management policies
  • Security (mTLS) between services

Network Performance Optimization

Optimization techniques:

  • SR-IOV: Hardware virtualization for direct device access
  • DPDK: Bypass kernel for packet processing
  • Jumbo frames: Reduce packet overhead
  • TCP tuning: Buffer sizes, congestion algorithms