Kubernetes

Container orchestration platform that schedules, scales, and heals workloads across a cluster of machines.

Category
Infrastructure
Difficulty
Advanced
When to use
You run multiple services at nontrivial scale and need rolling deploys, autoscaling, and a uniform way to manage them.
When not to use
You have one service and three users — a VM or a managed container platform is much cheaper to operate.
Alternatives
Nomad ECS Cloud Run Fly.io

At a glance

FieldValue
CategoryContainer orchestration
DifficultyAdvanced
When to useMany services, autoscaling, multi-team platforms
When not to useSmall single-service apps; solo projects
AlternativesNomad, ECS, Cloud Run, Fly.io

What it is

Kubernetes (“k8s”) takes a desired state described in YAML — “run three replicas of this image with these resources behind this service” — and continuously reconciles the cluster to match it. It handles scheduling onto nodes, restarts on crash, rolling updates, and horizontal autoscaling.

When we reach for it at Ephizen

  • Production serving of FastAPI inference services with rolling deploys and health checks.
  • GPU workloads via node pools and the NVIDIA device plugin.
  • Batch training jobs using Job resources or Kueue.
  • Multi-tenant internal platforms where each team gets a namespace.

Getting started

apiVersion: apps/v1
kind: Deployment
metadata:
  name: infer
spec:
  replicas: 3
  selector: { matchLabels: { app: infer } }
  template:
    metadata: { labels: { app: infer } }
    spec:
      containers:
        - name: infer
          image: ephizen/infer:latest
          ports: [{ containerPort: 8000 }]
          resources:
            limits: { cpu: '2', memory: '4Gi' }

Gotchas

  • Kubernetes is a platform for platform engineers. The “yaml engineering” surface is big, and mistakes are expensive.
  • Resource requests and limits matter. Under-set requests → noisy neighbors. Missing limits → OOM kills.
  • For LLM serving, pair with KServe, vLLM, or a Ray cluster — don’t reinvent batching in raw Deployments.
  • Local dev: use kind, k3d, or minikube. Don’t test against prod.

Related tools