Deep Learning
The standard reference for the mathematical and conceptual foundations of deep learning. Dense, but the only book that covers everything in one place.
This is the textbook every deep learning course points at. Part I rebuilds the math you need (linear algebra, probability, numerical computation, information theory) in a way that’s specifically tuned for what you’ll see in Part II. It’s the rare math review that doesn’t waste your time.
Part II is the meat: feedforward networks, regularization, optimization, CNNs, RNNs, and applications. It’s pre-transformer, so don’t expect modern LLM coverage — but the chapters on optimization (8) and regularization (7) are still the cleanest treatments of those topics anywhere. If you’ve ever wondered why Adam works, or what dropout is actually doing to your model, this is where to find out.
Part III (research) has aged less gracefully and you can mostly skim it. The exception is the chapter on autoencoders, which is short and worth reading because the same ideas come back in modern representation learning. Read this book the way you’d read a math reference: skim, bookmark, return. Don’t try to do it cover to cover on your first pass.
Related resources
- Neural Networks: Zero to HeroAndrej KarpathyKarpathy builds neural networks from scratch in plain Python and PyTorch, ending with a working transformer trained on Tiny Shakespeare.
- Linear Algebra (MIT 18.06)Gilbert StrangGilbert Strang's full MIT 18.06 course. The single best place to learn linear algebra well enough to read ML papers without flinching.