Diffusion Model
A generative model that learns to reverse a gradual noising process — you train it to denoise, then sample by iteratively denoising pure noise.
In one line
Train a model to remove a little bit of noise; generate by starting from pure noise and calling it many times.
What it actually means
The forward process takes a clean image and gradually adds Gaussian noise over T steps until it’s indistinguishable from random noise. The model learns the reverse process: given a noisy image at step t, predict the noise that was added (or equivalently, the slightly-less-noisy image at step t-1). At inference you start from noise and run the denoiser T times (or far fewer with modern samplers). In latent diffusion (Stable Diffusion), you do all of this in the compressed latent space of an autoencoder to make it affordable.
Why it matters
Diffusion models replaced GANs as the dominant approach for high-quality image generation around 2021. Stable Diffusion, DALL-E 2/3, Midjourney, and most video models are diffusion-based. They train more stably than GANs and scale better, at the cost of needing many forward passes to sample.
Example
x_T ~ N(0, I)
for t in T..1:
eps_pred = model(x_t, t, text_embedding)
x_{t-1} = denoise_step(x_t, eps_pred, t)
return x_0
You’ll hear it when
- Working with Stable Diffusion, SDXL, or Flux.
- Comparing samplers (DDIM, DPM++, Euler).
- Discussing CFG scale, number of steps, or latent upscaling.
- Evaluating image or video generation quality.