FastAPI
Async Python web framework built on Starlette and Pydantic. Our default choice for wrapping models as HTTP services.
At a glance
| Field | Value |
|---|---|
| Category | API & Serving |
| Difficulty | Beginner → Intermediate |
| When to use | Typed async Python HTTP services; ML/LLM inference endpoints |
| When not to use | Server-rendered HTML apps, admin panels, pure static sites |
| Alternatives | Flask, Django REST Framework, Litestar, BentoML |
What it is
FastAPI is a Python web framework that sits on top of two libraries:
- Starlette for the ASGI server plumbing (routing, middleware, WebSockets).
- Pydantic for request/response validation and serialization.
You write Python type hints. FastAPI turns them into request parsers, response
serializers, and an OpenAPI schema that auto-generates docs at /docs. That
single property — types are the contract — is why we reach for it first.
The async model in one paragraph
FastAPI runs under an ASGI server (uvicorn, hypercorn). Endpoints declared
with async def are awaited on the event loop; endpoints declared with def
are offloaded to a thread pool. Use async def when you’re awaiting I/O
(HTTP, DB drivers that support async, LLM SDKs). Use plain def when the
work is CPU-bound or the library is blocking (e.g. a sync database driver).
Mixing the two is fine and common.
Pydantic validation
Request bodies, query params, and responses are declared as Pydantic models. If the caller sends garbage, FastAPI returns a 422 with a precise JSON error before your handler ever runs. This is the main reason we avoid Flask for model-serving work: the validation boilerplate just disappears.
from pydantic import BaseModel, Field
class PredictRequest(BaseModel):
text: str = Field(min_length=1, max_length=8000)
threshold: float = Field(0.5, ge=0, le=1)
class PredictResponse(BaseModel):
label: str
score: float
Dependency injection
FastAPI’s Depends() is a lightweight DI container. Use it for:
- Loading the model once at startup and handing it to every request.
- Auth (API keys, JWTs) — declare
Depends(get_current_user)and the handler only runs if the dep resolves. - DB sessions with proper cleanup.
from fastapi import Depends, FastAPI
def get_model():
return app.state.model
@app.post("/predict", response_model=PredictResponse)
async def predict(req: PredictRequest, model=Depends(get_model)):
score = float(model.score(req.text))
return PredictResponse(
label="positive" if score >= req.threshold else "negative",
score=score,
)
How Ephizen uses it
Every model we ship runs behind FastAPI. The shape is always the same:
lifespancontext loads weights intoapp.stateonce (never per-request).- Handlers validate input with Pydantic, hand off to the model, return a typed response.
- Heavy CPU work (e.g. an XGBoost
predict) runs in a threadpool via a plaindefendpoint. Async I/O (embedding API calls, vector DB lookups) runs underasync def. - A
/healthzendpoint returns 200 once the model is loaded so Kubernetes only routes traffic after warmup.
Running it
uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
Rules of thumb for workers: start with 2 * cores for CPU-bound models,
1 worker with many async tasks for I/O-bound LLM proxies. Don’t over-worker
a GPU model — each worker loads its own copy of the weights.
Common pitfalls
- BackgroundTasks disappear on shutdown. FastAPI’s
BackgroundTasksruns after the response is sent but inside the same process. If the pod is evicted mid-task, work is lost. For anything durable, push to a queue (Redis Streams, Kafka, SQS) and let a worker consume it. - CORS silently blocks browsers. Add
CORSMiddlewareexplicitly. Without it, every call from a browser origin is a preflight failure with no server log. The FastAPI server looks fine from curl — developers lose hours. - Large uploads pin a worker.
UploadFilebuffers to disk, but the default uvicorn request body limit and proxy timeouts will bite you at ~100 MB. Stream chunks to S3 instead of holding them in memory. - Sync DB drivers under
async def. Calling psycopg2 fromasync defblocks the event loop and kills throughput. Either usedef+ threadpool or switch toasyncpg. - Pydantic v1 vs v2. FastAPI ≥ 0.100 uses Pydantic v2. Validator syntax,
.dict()vs.model_dump(), and config classes all changed. Pin versions.