Transfer Learning

Transfer learning reuses a model (or its features) trained on a large source taskâ€”ImageNet, web text, speech corporaâ€”to improve a smaller target task. You save data, compute, and often get better generalization than training from scratch. Common patterns: feature extraction (freeze backbone, train only a linear head), fine-tuning (unfreeze some or all layers with a smaller learning rate), and progressive unfreezing (train head first, then deeper blocks).

pretrained domain shift LR groups PyTorch

When It Helps

Transfer shines when target data are limited but related to the source: medical images from natural-image pretraining, sentiment on top of general language models, etc. If domains differ a lot (satellite vs portrait photos), you may need more target data, domain adaptation, or pretraining closer to your distribution.

The backbone learns generic edges, textures, or syntactic patterns; the head maps those features to your classes or outputs.

Freezing vs Fine-Tuning

Freeze all convolutional weights and train only the classifier: fast, little overfitting risk, good baselines. Fine-tune with a lower LR on the backbone than the head so you do not destroy useful filters in a few steps. Differential learning ratesâ€”smaller LR for early layers, larger for later onesâ€”sometimes help because early layers stay more universal.

Match input preprocessing (resize, normalize mean/std) to the pretrained modelâ€™s training recipe, or performance can collapse silently.

PyTorch Sketch

Two parameter groups (backbone vs head)

import torch.optim as optim

optimizer = optim.AdamW(
    [
        {"params": model.backbone.parameters(), "lr": 1e-5},
        {"params": model.head.parameters(), "lr": 1e-3},
    ],
    weight_decay=0.01,
)

Summary

Reuse pretrained weights when data or compute are constrained.
Start with frozen backbone + new head; unfreeze if validation plateaus.
Use smaller LR on backbone; align preprocessing with the checkpoint.
Next: evaluation metrics to judge models fairly.

Good accuracy is not always the right numberâ€”learn precision, recall, F1, and AUC next.

Previous: Attention Next: Evaluation metrics

Related Neural Networks Links