Neural Network

Deep Learning

A model built from stacked layers of linear transformations and non-linear activations, trained end-to-end with gradient descent.


In one line

A stack of linear layers with non-linear activations between them, trained by computing gradients of a loss with respect to every weight.

What it actually means

At its simplest, a feed-forward neural network applies h = activation(W x + b) repeatedly, with a different W and b at each layer. The final layer’s output feeds a loss function against the target. Backpropagation computes the gradient of the loss with respect to every weight via the chain rule, and an optimizer takes a step. Everything in “deep learning” — CNNs, RNNs, transformers, diffusion models — is a specific choice of which linear operations you use and how you wire them.

Why it matters

The neural network is the base abstraction of modern ML. If you understand forward pass, loss, backward pass, and parameter update, every architecture is a variation on that loop. The reason the field took off isn’t a clever insight about any single architecture — it’s that big nets plus lots of data plus GPUs actually generalize.

Example

import torch.nn as nn
net = nn.Sequential(
    nn.Linear(784, 256), nn.ReLU(),
    nn.Linear(256, 128), nn.ReLU(),
    nn.Linear(128, 10),
)

You’ll hear it when

  • Teaching or learning the basics.
  • Debating neural nets vs classical ML on a small-data task.
  • Explaining backprop at a whiteboard.
  • Reading almost any paper in the field.

Related terms