Kubernetes
Container orchestration platform that schedules, scales, and heals workloads across a cluster of machines.
Category
Infrastructure
Difficulty
Advanced
When to use
You run multiple services at nontrivial scale and need rolling deploys, autoscaling, and a uniform way to manage them.
When not to use
You have one service and three users — a VM or a managed container platform is much cheaper to operate.
Alternatives
Nomad ECS Cloud Run Fly.io
At a glance
| Field | Value |
|---|---|
| Category | Container orchestration |
| Difficulty | Advanced |
| When to use | Many services, autoscaling, multi-team platforms |
| When not to use | Small single-service apps; solo projects |
| Alternatives | Nomad, ECS, Cloud Run, Fly.io |
What it is
Kubernetes (“k8s”) takes a desired state described in YAML — “run three replicas of this image with these resources behind this service” — and continuously reconciles the cluster to match it. It handles scheduling onto nodes, restarts on crash, rolling updates, and horizontal autoscaling.
When we reach for it at Ephizen
- Production serving of FastAPI inference services with rolling deploys and health checks.
- GPU workloads via node pools and the NVIDIA device plugin.
- Batch training jobs using
Jobresources or Kueue. - Multi-tenant internal platforms where each team gets a namespace.
Getting started
apiVersion: apps/v1
kind: Deployment
metadata:
name: infer
spec:
replicas: 3
selector: { matchLabels: { app: infer } }
template:
metadata: { labels: { app: infer } }
spec:
containers:
- name: infer
image: ephizen/infer:latest
ports: [{ containerPort: 8000 }]
resources:
limits: { cpu: '2', memory: '4Gi' }
Gotchas
- Kubernetes is a platform for platform engineers. The “yaml engineering” surface is big, and mistakes are expensive.
- Resource requests and limits matter. Under-set requests → noisy neighbors. Missing limits → OOM kills.
- For LLM serving, pair with KServe, vLLM, or a Ray cluster — don’t reinvent batching in raw Deployments.
- Local dev: use kind, k3d, or minikube. Don’t test against prod.
Related tools
- DockerContainer runtime and image format that packages an application with its dependencies so it runs the same way everywhere.
- PostmanA graphical HTTP client for designing, testing, and documenting APIs — the standard "poke the endpoint" tool on most teams.
- PydanticPython data validation library using type hints. The backbone of FastAPI, LLM structured output, and a lot of modern Python codebases.