Diffusion Model

Deep Learning

A generative model that learns to reverse a gradual noising process — you train it to denoise, then sample by iteratively denoising pure noise.


In one line

Train a model to remove a little bit of noise; generate by starting from pure noise and calling it many times.

What it actually means

The forward process takes a clean image and gradually adds Gaussian noise over T steps until it’s indistinguishable from random noise. The model learns the reverse process: given a noisy image at step t, predict the noise that was added (or equivalently, the slightly-less-noisy image at step t-1). At inference you start from noise and run the denoiser T times (or far fewer with modern samplers). In latent diffusion (Stable Diffusion), you do all of this in the compressed latent space of an autoencoder to make it affordable.

Why it matters

Diffusion models replaced GANs as the dominant approach for high-quality image generation around 2021. Stable Diffusion, DALL-E 2/3, Midjourney, and most video models are diffusion-based. They train more stably than GANs and scale better, at the cost of needing many forward passes to sample.

Example

x_T ~ N(0, I)
for t in T..1:
    eps_pred = model(x_t, t, text_embedding)
    x_{t-1} = denoise_step(x_t, eps_pred, t)
return x_0

You’ll hear it when

  • Working with Stable Diffusion, SDXL, or Flux.
  • Comparing samplers (DDIM, DPM++, Euler).
  • Discussing CFG scale, number of steps, or latent upscaling.
  • Evaluating image or video generation quality.

Related terms

See also