2012 NeurIPS 2012

ImageNet Classification with Deep Convolutional Neural Networks

Krizhevsky, Sutskever, Hinton

TL;DR
The paper that kicked off the deep learning era. A large CNN trained on GPUs crushed ImageNet 2012, halving the previous error rate.

What it says

The authors train an 8-layer CNN with 60M parameters on ImageNet-1k using two GTX 580 GPUs. They introduce (or popularize) several tricks: ReLU activations instead of tanh, dropout in the fully-connected layers, data augmentation with random crops and horizontal flips, local response normalization, and overlapping pooling. The network takes top-5 error from the previous best of ~26% down to 15.3%. A landslide.

Why it matters

AlexNet is the paper where deep learning went from “academic curiosity” to “obvious answer for computer vision”. Every lab pivoted within months. ReLU, dropout, and GPU training became universal, and ImageNet became the benchmark that launched a decade of progress. Without AlexNet, the rest of the story doesn’t happen.

  • VGG (Simonyan & Zisserman, 2014) — simpler, deeper, trained-from-scratch backbone.
  • GoogLeNet / Inception (Szegedy et al, 2014) — efficient multi-branch blocks.
  • ResNet (He et al, 2015) — skip connections that finally let networks go truly deep.