Latent Space
The lower-dimensional vector space a model compresses its inputs into, where similar things end up near each other.
In one line
The compressed vector space a model learns to represent inputs in — the place where “meaning” lives.
What it actually means
An autoencoder compresses an input down to a small vector (the latent) and then reconstructs it. A diffusion model runs its denoising process in the latent space of a pretrained VAE instead of in pixel space. An embedding model maps sentences into a latent space where nearest-neighbor search gives you semantic similarity. “Latent” just means hidden — it’s the internal representation, not the raw input and not the final output. The shape of this space matters: a well-structured latent space has smooth directions (you can interpolate between two points and get meaningful interpolants) and disentangled factors.
Why it matters
Most of modern generative AI operates in latent space because pixels and raw tokens are too expensive or too noisy. Stable Diffusion is cheap because it runs in an 8x-downsampled latent. Semantic search works because the embedding latent space aligns meaning with geometry. Understanding latents is how you reason about what a model “sees”.
Example
image (512x512x3)
→ encoder → latent (64x64x4)
→ diffusion denoising in latent space
→ decoder → image (512x512x3)
You’ll hear it when
- Reading papers on VAEs, diffusion models, or embeddings.
- Discussing “interpolating in latent space” for generation.
- Debugging an encoder that maps everything to the same point.
- Explaining why semantic search works.