
EE 547 - Unit 2
Fall 2025


10% improvement in utilization = Billions in savings








The Game Changer (2005-2006)
Before hardware support:
After hardware support:
VMX Operations (Intel terminology)
VMXON: Enable virtualizationVMLAUNCH: Start VMVMRESUME: Continue VMVMEXIT: Return to hypervisorVMCS/VMCB - Virtual Machine Control Structure

Credit Scheduler (Xen)
Each VM allocated credits based on weight
VM consumes credits when running
Depleted credits → lower priority
Credits refresh periodically
CFS (KVM - Completely Fair Scheduler)
Equal CPU time by default
Configurable weights/shares
Nice values for priority
Real-time scheduling available
Ballooning: Cooperative memory reclamation
Page Sharing: Deduplicate identical pages
Compression: Compress inactive pages
Swapping: Last resort - disk backing

Virtual CPU Configuration
<vcpu placement='static'>4</vcpu>
<cpu mode='host-passthrough'>
<topology sockets='1' cores='4' threads='1'/>
</cpu>Virtual Memory
Virtual Storage
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='writeback'/>
<source file='/var/lib/vms/ubuntu.qcow2'/>
<target dev='vda' bus='virtio'/>
</disk>Virtual Network
VM Creation
Hypervisor allocates:
- Memory regions
- Virtual CPU structures
- Device emulation threadsFirmware Initialization
Virtual BIOS/UEFI:
- Memory detection (fake)
- Device enumeration (virtual)
- Boot device selectionBootloader
GRUB/Windows Boot Manager:
- Reads virtual disk
- Loads kernel into memory
- Sets up initial ramdiskOperating System
Kernel initialization:
- Detects "hardware" (all virtual)
- Loads drivers
- Starts init process$ lscpu
Architecture: x86_64
CPU(s): 4
Model name: Intel Xeon E5-2680 v4
CPU MHz: 2400.000
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 35840K
$ free -h
total used free
Mem: 16Gi 2.1Gi 12Gi
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 100G 45G 55G 45% /$ ps aux | grep qemu
qemu-system-x86_64 -enable-kvm -m 16384 -smp 4 \
-drive file=vm.qcow2 -netdev tap,id=net0
$ top
PID USER PR VIRT RES SHR %CPU %MEM COMMAND
15234 qemu 20 17.2g 16.1g 4.2g 385 12.5 qemu-system-x86The VM is just another process!
# 1. Create disk image
qemu-img create -f qcow2 ubuntu-vm.qcow2 50G
# 2. Start VM with Ubuntu installer
qemu-system-x86_64 \
-enable-kvm \ # Use hardware acceleration
-cpu host \ # Pass through CPU features
-m 8192 \ # 8GB RAM
-smp cores=4 \ # 4 CPU cores
-drive file=ubuntu-vm.qcow2,if=virtio \
-cdrom ubuntu-22.04.iso \
-boot d \ # Boot from CD
-vga qxl \
-spice port=5900,addr=127.0.0.1
# 3. Connect to console
remote-viewer spice://localhost:5900
# 4. After installation, normal boot
qemu-system-x86_64 \
-enable-kvm \
-cpu host \
-m 8192 \
-smp cores=4 \
-drive file=ubuntu-vm.qcow2,if=virtio \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net,netdev=net0
VM Exits - The Main Culprit
Cache Effects
Translation Overhead
Measured Impact
STREAM Benchmark (Memory Bandwidth):
Native: 95 GB/s
VM with EPT: 88 GB/s (7% loss)
VM without: 68 GB/s (28% loss)


Strong Isolation Required
Different Operating Systems
Resource Guarantees
Performance Overhead
Resource Efficiency
Next: How containers solve these problems…


The kernel is the core of the operating system that:

Why can’t applications access hardware directly?
Containers vs VMs:


# Terminal 1: Start a container
$ docker run -d --name demo nginx
a5f3c8b9d2e1
# Terminal 2: Find it as a process
$ ps aux | grep nginx
root 15234 0.0 0.1 141836 2308 ? Ss 10:42 nginx: master
www-data 15235 0.0 0.1 142268 3544 ? S 10:42 nginx: worker
# It's just a process!
$ pstree -p 15234
nginx(15234)───nginx(15235)
# Check its namespaces
$ ls -la /proc/15234/ns/
lrwxrwxrwx 1 root root ipc:[4026532439] # Isolated IPC
lrwxrwxrwx 1 root root mnt:[4026532437] # Isolated mounts
lrwxrwxrwx 1 root root net:[4026532442] # Isolated network
lrwxrwxrwx 1 root root pid:[4026532440] # Isolated PIDs
lrwxrwxrwx 1 root root uts:[4026532438] # Isolated hostname
# Kill it like any process
$ kill 15234
$ docker ps -a
CONTAINER ID STATUS
a5f3c8b9d2e1 Exited (137)

| Namespace | Isolates | Year | Example |
|---|---|---|---|
| Mount | Filesystem mount points | 2002 | Container sees / as its image |
| UTS | Hostname and domain name | 2006 | Container has own hostname |
| IPC | Inter-process communication | 2006 | Separate shared memory |
| PID | Process IDs | 2008 | Container PID 1 is init |
| Network | Network devices, stacks, ports | 2009 | Own IP address, ports |
| User | User and group IDs | 2013 | Root in container ≠ host root |




# Create a simple container manually
# 1. Create namespaces
unshare --pid --mount --net --uts --ipc --fork bash
# 2. Set hostname (UTS namespace)
hostname my-container
# 3. Mount proc (PID namespace)
mount -t proc proc /proc
# 4. Configure network (Network namespace)
ip link set lo up
ip addr add 172.17.0.2/16 dev eth0
# 5. Set resource limits (cgroups)
echo $$ > /sys/fs/cgroup/memory/docker/container1/cgroup.procs
echo 4294967296 > /sys/fs/cgroup/memory/docker/container1/memory.limit_in_bytes
# You've created a container!

OverlayFS - The modern standard
Layer 1: Ubuntu base [Read-only]
/bin/bash
/lib/libc.so
Layer 2: Python install [Read-only]
/usr/bin/python3
/usr/lib/python3.8/
Layer 3: App code [Read-only]
/app/main.py
/app/config.yaml
Container Layer: [Read-write]
(empty initially)
Union View: [What container sees]
/bin/bash (from layer 1)
/usr/bin/python3 (from layer 2)
/app/main.py (from layer 3)
When container modifies a file: 1. File copied from read-only layer 2. Copy placed in read-write layer 3. Original remains unchanged

Pull/Build Image
/var/lib/dockerCreate Container
Start Container
Container Running

docker runClient → Daemon
Image Check
Create Container
Setup Namespaces
Configure cgroups
Network Setup
Start Process

Shared Kernel
Process-Level Isolation
Security Boundaries
Use VMs when:
Use Containers when:

Think of it like this: - Image = Python class definition (blueprint) - Container = Instance of that class (running object) - Images are immutable; containers have writable layer - docker images lists images; docker ps lists containers

Official vs User Images:
python → Official Python imageusername/myapp → User’s imagegcr.io/project/image → Google Container Registrylatest in production# Image Management
docker pull python:3.11 # Download image from registry
docker images # List local images
docker rmi python:3.11 # Remove image
docker build -t myapp:v1 . # Build image from Dockerfile
# Container Lifecycle
docker run python:3.11 python -c "print('Hello')" # Run and exit
docker run -d nginx # Run in background (detached)
docker run -it ubuntu bash # Interactive terminal
docker ps # List running containers
docker ps -a # List all containers
docker stop container_id # Stop gracefully
docker kill container_id # Force stop
docker rm container_id # Remove stopped container
# Debugging and Inspection
docker logs container_id # View output
docker exec -it container_id bash # Enter running container
docker inspect container_id # Full container details
docker stats # Resource usage


Layer ordering impacts build performance:
# Multi-stage build for production
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
FROM python:3.11-slim
WORKDIR /app
# Copy only installed packages
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH
# Copy application last for cache efficiency
COPY model/ ./model/
COPY src/ ./src/
CMD ["python", "src/inference.py"]Key principles:

Resource limits prevent:
Bridge (default)
Host

Volume selection criteria:
# Run as non-root user
docker run --user 1000:1000 app
# Read-only root filesystem
docker run --read-only \
--tmpfs /tmp \
--tmpfs /var/run \
app
# Drop capabilities
docker run --cap-drop ALL \
--cap-add NET_BIND_SERVICE \
app
# Security options
docker run --security-opt no-new-privileges \
--security-opt apparmor=docker-default \
appSecurity layers:
# Dockerfile with health check
FROM python:3.11-slim
# Health check configuration
HEALTHCHECK --interval=30s \
--timeout=3s \
--start-period=60s \
--retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8080/health')"
# Application setup
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
EXPOSE 8080
CMD ["python", "app.py"]Health check types:
Docker daemon actions on failure:
# Inspect running processes
docker exec container_id ps aux
# Attach to running container
docker exec -it container_id /bin/bash
# View container logs
docker logs --follow --tail 100 container_id
# Inspect container metadata
docker inspect container_id | jq '.State'
# Copy files from container
docker cp container_id:/app/logs/error.log ./
# Monitor resource usage
docker stats container_idAdvanced debugging:

Registry selection factors:
# Create project directory
mkdir ml-inference && cd ml-inference
# Create directory structure
mkdir -p src model data
# Project layout:
ml-inference/
├── Dockerfile
├── requirements.txt
├── model/
│ └── model.pkl # Pretrained model
├── src/
│ ├── app.py # FastAPI application
│ ├── inference.py # Model inference logic
│ └── preprocess.py # Data preprocessing
└── data/
└── sample.json # Test data# src/app.py - FastAPI inference server
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import pickle
import numpy as np
from typing import List
import time
import logging
app = FastAPI(title="ML Inference API")
logger = logging.getLogger(__name__)
# Load model at startup
with open('/app/model/model.pkl', 'rb') as f:
model = pickle.load(f)
logger.info(f"Model loaded: {type(model)}")
class PredictionRequest(BaseModel):
features: List[float]
class PredictionResponse(BaseModel):
prediction: float
confidence: float
inference_time_ms: float
@app.get("/health")
def health_check():
"""Kubernetes/Docker health check endpoint"""
return {"status": "healthy", "model_loaded": model is not None}
@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
"""Main prediction endpoint"""
start_time = time.time()
try:
# Preprocess
features = np.array(request.features).reshape(1, -1)
# Predict
prediction = model.predict(features)[0]
confidence = model.predict_proba(features).max()
# Measure time
inference_time = (time.time() - start_time) * 1000
return PredictionResponse(
prediction=float(prediction),
confidence=float(confidence),
inference_time_ms=inference_time
)
except Exception as e:
logger.error(f"Prediction failed: {e}")
raise HTTPException(status_code=500, detail=str(e))# Multi-stage build for smaller final image
# Stage 1: Builder
FROM python:3.11-slim as builder
# Install build dependencies
RUN apt-get update && apt-get install -y \
gcc \
g++ \
&& rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Copy and install requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Stage 2: Runtime
FROM python:3.11-slim
# Create non-root user
RUN useradd -m -u 1000 mluser
# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Set working directory
WORKDIR /app
# Copy application code
COPY --chown=mluser:mluser src/ ./src/
COPY --chown=mluser:mluser model/ ./model/
# Switch to non-root user
USER mluser
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
CMD python -c "import requests; r = requests.get('http://localhost:8080/health'); r.raise_for_status()"
# Expose port
EXPOSE 8080
# Run application
CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8080"]
# Build command with explanations
DOCKER_BUILDKIT=1 docker build \
--cache-from python:3.11-slim \ # Use cached base
--build-arg BUILDKIT_INLINE_CACHE=1 \ # Enable inline cache
--progress=plain \ # Show detailed output
-t ml-inference:v1.0 \ # Tag the image
. # Build context (current dir)
# Build output:
# => [internal] load build definition from Dockerfile
# => [internal] load .dockerignore
# => [internal] load metadata for docker.io/library/python:3.11-slim
# => [builder 1/5] FROM docker.io/library/python:3.11-slim
# => [builder 2/5] RUN apt-get update && apt-get install -y gcc g++
# => [builder 3/5] RUN python -m venv /opt/venv
# => [builder 4/5] COPY requirements.txt .
# => [builder 5/5] RUN pip install --no-cache-dir -r requirements.txt
# => [stage-1 1/6] COPY --from=builder /opt/venv /opt/venv
# => exporting to image
# => naming to docker.io/library/ml-inference:v1.0# Development mode - with live code reload
docker run -it --rm \
--name ml-dev \
-p 8080:8080 \
-v $(pwd)/src:/app/src:ro \
-v $(pwd)/model:/app/model:ro \
-e LOG_LEVEL=DEBUG \
ml-inference:v1.0
# Production mode - with resource limits
docker run -d \
--name ml-prod \
--restart unless-stopped \
--memory="2g" \
--memory-reservation="1g" \
--cpus="1.5" \
--pids-limit 100 \
-p 8080:8080 \
--read-only \
--tmpfs /tmp \
ml-inference:v1.0# Check if container is running
docker ps
# CONTAINER ID IMAGE STATUS PORTS
# abc123def456 ml-inference:v1.0 Up 2 minutes 0.0.0.0:8080->8080/tcp
# Check health endpoint
curl http://localhost:8080/health
# {"status":"healthy","model_loaded":true}
# Test prediction
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{"features": [5.1, 3.5, 1.4, 0.2]}'
# {"prediction":0,"confidence":0.98,"inference_time_ms":12.5}
# View logs
docker logs ml-prod --tail 50
# INFO: Started server process [1]
# INFO: Waiting for application startup.
# INFO: Model loaded: <class 'sklearn.ensemble._forest.RandomForestClassifier'>
# INFO: Application startup complete.
# Monitor resource usage
docker stats ml-prod --no-stream
# CONTAINER CPU % MEM USAGE / LIMIT
# ml-prod 15.2% 523MiB / 2GiB # Container won't start? Check logs
docker logs ml-prod
# Error: Model file not found at /app/model/model.pkl
# Need to debug inside container?
docker exec -it ml-prod /bin/bash
mluser@abc123:/app$ ls -la
mluser@abc123:/app$ python -c "import pickle; print('Pickle works')"
mluser@abc123:/app$ exit
# Container crashes immediately?
docker run -it --entrypoint /bin/bash ml-inference:v1.0
# Now you're in the container with a shell
# Check what files made it into the image
docker run --rm ml-inference:v1.0 find /app -type f
# Inspect image layers and sizes
docker history ml-inference:v1.0
# IMAGE CREATED SIZE COMMAND
# abc123 2 hours ago 2.1MB COPY src/ ./src/
# def456 2 hours ago 125MB COPY model/ ./model/
# ...
Optimization techniques used:
Why orchestration is necessary:


Docker Compose manages multi-container applications on a single host:
version: '3.8'
services:
web:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./html:/usr/share/nginx/html:ro
depends_on:
- api
networks:
- frontend
api:
build: ./api
environment:
DATABASE_URL: postgresql://user:pass@db:5432/myapp
REDIS_URL: redis://cache:6379
depends_on:
- db
- cache
networks:
- frontend
- backend
restart: unless-stopped
db:
image: postgres:14
environment:
POSTGRES_PASSWORD: secretpass
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- backend
cache:
image: redis:7-alpine
networks:
- backend
volumes:
postgres_data:
networks:
frontend:
backend:
YAML = YAML Ain’t Markup Language
-, dictionaries with :#
Common YAML Gotchas: - version: 1.10 becomes float 1.1 (use quotes: "1.10") - Tabs are forbidden (spaces only) - : in values needs quotes: description: "Error: failed" - yes, no, on, off are booleans (use quotes for strings)
# Start all services
docker-compose up # Foreground with logs
docker-compose up -d # Background (detached)
# Manage services
docker-compose ps # List running services
docker-compose logs api # View logs for service
docker-compose stop # Stop all services
docker-compose down # Stop and remove containers
docker-compose down -v # Also remove volumes
# Scaling and updates
docker-compose up -d --scale worker=3 # Run 3 worker instances
docker-compose pull # Update images
docker-compose build # Rebuild custom images
docker-compose restart api # Restart single service
# Development workflow
docker-compose exec api bash # Shell into running service
docker-compose run api pytest # Run command in new container

Development (docker-compose.override.yml):
services:
api:
build: .
volumes:
- ./src:/app/src # Live code reload
ports:
- "5678:5678" # Debugger port
environment:
- DEBUG=trueProduction (docker-compose.prod.yml):

Docker Compose works well until it doesn’t. The transition to Kubernetes typically happens when:

Orchestration solves:

Control Plane (Master nodes):
Data Plane (Worker nodes):
apiVersion: v1
kind: Pod
metadata:
name: ml-inference
labels:
app: inference
version: v2
spec:
containers:
- name: model-server
image: ml-inference:v2.1
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
ports:
- containerPort: 8080
- name: metrics-collector
image: prometheus-exporter:latest
ports:
- containerPort: 9090Pod characteristics:

Controller types:
apiVersion: v1
kind: Service
metadata:
name: inference-service
spec:
selector:
app: inference
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: inference-internal
spec:
selector:
app: inference
type: ClusterIP
ports:
- port: 8080Service types:
# ConfigMap for application configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: model-config
data:
model_path: "/models/latest"
batch_size: "32"
num_workers: "4"
---
# Secret for sensitive data
apiVersion: v1
kind: Secret
metadata:
name: api-keys
type: Opaque
data:
aws_access_key: <base64_encoded>
database_password: <base64_encoded>Usage in pods:

Storage concepts:

QoS determination:
apiVersion: v1
kind: Pod
metadata:
name: gpu-training
spec:
nodeSelector:
gpu: "true"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values: ["p3.2xlarge", "p3.8xlarge"]
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values: ["gpu-training"]
topologyKey: kubernetes.io/hostnameScheduling constraints:

Update strategies:
# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: inference-metrics
spec:
selector:
matchLabels:
app: inference
endpoints:
- port: metrics
interval: 30s
path: /metricsKey metrics:
Observability stack:

Operators extend Kubernetes:
# Create namespace
kubectl create namespace ml-pipeline
# Deploy training job
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: model-training
namespace: ml-pipeline
spec:
template:
spec:
containers:
- name: trainer
image: ml-training:latest
resources:
limits:
nvidia.com/gpu: 1
restartPolicy: OnFailure
backoffLimit: 3
EOF
# Deploy inference service
kubectl apply -f inference-deployment.yaml
kubectl apply -f inference-service.yaml
# Setup horizontal autoscaling
kubectl autoscale deployment inference \
--cpu-percent=70 \
--min=2 \
--max=10
# Check status
kubectl get pods -n ml-pipeline
kubectl top pods -n ml-pipelineProduction considerations:

Key distinction:

Focus areas for virtualization:

CIDR (Classless Inter-Domain Routing):
/24 = 255.255.255.0 = 256 addresses (254 usable)/16 = 255.255.0.0 = 65,536 addresses
Interface types in virtualization:

Each layer adds its header:

Bridge networking provides:

VLAN benefits:

Routing fundamentals:

TCP provides:

Protocol selection criteria:

Network namespaces provide:

Load balancing considerations:

Service mesh provides:

Optimization techniques: