Transfer Learning
Taking a model pretrained on a large general dataset and reusing it — with or without further training — on a different, usually smaller, task.
In one line
Start from someone else’s pretrained weights instead of random init — almost always better when your dataset is small.
What it actually means
Pretraining teaches a model general features from a huge dataset (ImageNet for CV, web text for LLMs). Transfer learning takes that pretrained backbone and adapts it to a new task: freeze the backbone and train a new head, or fine-tune everything at a small learning rate, or insert LoRA adapters. The idea is that low-level features — edges, textures, basic language structure — are reusable across tasks, so you don’t need a million examples of your specific problem.
Why it matters
Transfer learning is why small teams can ship models. Nobody trains a 7B language model from scratch for a customer support chatbot — you start from LLaMA or Mistral and fine-tune. Same in CV: start from a pretrained ResNet, CLIP, or DINO backbone and specialize. If your labeled dataset is under ~100k examples, transfer learning almost always beats training from scratch.
Example
import torchvision.models as models
backbone = models.resnet50(weights="IMAGENET1K_V2")
for p in backbone.parameters(): p.requires_grad = False
backbone.fc = nn.Linear(backbone.fc.in_features, num_classes)
You’ll hear it when
- Starting almost any vision or NLP project.
- Deciding whether to fine-tune a foundation model.
- Comparing pretrained backbones (ResNet, ViT, DINO, CLIP).
- Explaining why modern ML doesn’t need million-example datasets.