Transfer Learning

Deep Learning

Taking a model pretrained on a large general dataset and reusing it — with or without further training — on a different, usually smaller, task.


In one line

Start from someone else’s pretrained weights instead of random init — almost always better when your dataset is small.

What it actually means

Pretraining teaches a model general features from a huge dataset (ImageNet for CV, web text for LLMs). Transfer learning takes that pretrained backbone and adapts it to a new task: freeze the backbone and train a new head, or fine-tune everything at a small learning rate, or insert LoRA adapters. The idea is that low-level features — edges, textures, basic language structure — are reusable across tasks, so you don’t need a million examples of your specific problem.

Why it matters

Transfer learning is why small teams can ship models. Nobody trains a 7B language model from scratch for a customer support chatbot — you start from LLaMA or Mistral and fine-tune. Same in CV: start from a pretrained ResNet, CLIP, or DINO backbone and specialize. If your labeled dataset is under ~100k examples, transfer learning almost always beats training from scratch.

Example

import torchvision.models as models
backbone = models.resnet50(weights="IMAGENET1K_V2")
for p in backbone.parameters(): p.requires_grad = False
backbone.fc = nn.Linear(backbone.fc.in_features, num_classes)

You’ll hear it when

  • Starting almost any vision or NLP project.
  • Deciding whether to fine-tune a foundation model.
  • Comparing pretrained backbones (ResNet, ViT, DINO, CLIP).
  • Explaining why modern ML doesn’t need million-example datasets.

Related terms