Neural Networks: Zero to Hero

Andrej Karpathy

Video Series intermediate Free 8 lectures, ~15 hours

Summary

Karpathy builds neural networks from scratch in plain Python and PyTorch, ending with a working transformer trained on Tiny Shakespeare.

Open

This is the fastest path from “I’ve heard of backpropagation” to “I have written, line by line, the thing that powers GPT.” Karpathy starts in a Jupyter notebook with scalars and builds up: micrograd, then a bigram model, then an MLP, then a transformer, then a GPT. By the end you have a real mental model of what every line of nn.Module is doing, which you cannot get from reading docs.

The standout lecture is “Let’s build GPT: from scratch, in code, spelled out.” It’s two hours long and worth every minute — by the end you’ve implemented self-attention, multi-head attention, residual streams, and layer norm with no hand-waving. The next time you read a transformer paper the diagrams will just click.

If you have to pick one resource on this entire wiki to actually do (not just watch), pick this one and type along. Skip nothing, and pause whenever he says “now you try.” The “how to be a researcher” video at the end is optional but a nice career-shaping watch.

Related resources

Deep Learning

Ian Goodfellow, Yoshua Bengio, Aaron Courville

The standard reference for the mathematical and conceptual foundations of deep learning. Dense, but the only book that covers everything in one place.