Residual block (concept)
import torch
import torch.nn as nn
class ResidualBlock(nn.Module):
def __init__(self, c):
super().__init__()
self.conv1 = nn.Conv2d(c, c, 3, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(c)
self.conv2 = nn.Conv2d(c, c, 3, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(c)
self.act = nn.ReLU(inplace=True)
def forward(self, x):
identity = x
out = self.act(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
return self.act(out + identity)
When channel or stride changes, the skip uses a 1×1 conv projection—see torchvision’s Bottleneck / BasicBlock.
torchvision: ResNet-18 and ResNet-50
from torchvision.models import resnet18, resnet50, ResNet18_Weights, ResNet50_Weights
r18 = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1).eval()
r50 = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2).eval()
tf18 = ResNet18_Weights.IMAGENET1K_V1.transforms()
tf50 = ResNet50_Weights.IMAGENET1K_V2.transforms()
Inference logits
from PIL import Image
import torch
img = tf50(Image.open("dog.jpg").convert("RGB")).unsqueeze(0)
with torch.no_grad():
logits = r50(img)
probs = logits.softmax(1).squeeze()
i = int(probs.argmax())
print(ResNet50_Weights.IMAGENET1K_V2.meta["categories"][i])
Backbone embedding (before FC)
# Remove classifier: avgpool + flatten → vector
backbone = nn.Sequential(*list(r50.children())[:-1]) # drop fc
with torch.no_grad():
feat = backbone(img).flatten(1)
print(feat.shape) # [1, 2048] for ResNet-50
Hooks (optional)
Register a forward hook on r50.layer4 if you need intermediate maps without rewriting the full forward. For a single embedding vector, the Sequential backbone above is usually enough.
Fine-tune last layer
num_classes = 5
r50.fc = nn.Linear(r50.fc.in_features, num_classes)
for p in r50.parameters():
p.requires_grad = False
for p in r50.fc.parameters():
p.requires_grad = True
Train all layers with lower LR on backbone (concept)
import torch.optim as optim
opt = optim.AdamW([
{"params": r50.fc.parameters(), "lr": 1e-3},
{"params": [p for n, p in r50.named_parameters() if "fc" not in n], "lr": 1e-5},
])
Names to know
- ResNeXt — grouped convolutions in blocks.
- Wide ResNet — wider channels, fewer layers sometimes competitive.
- EfficientNet / ConvNeXt — later efficiency-accuracy tradeoffs (different families).
Takeaways
- Residual:
y = F(x) + xstabilizes deep training. - ResNet-50 uses bottleneck blocks; ResNet-18 uses two 3×3 basic blocks.
- Standard transfer: replace
fc, freeze or differential LR.
Quick FAQ
model.eval() before inference so BN uses running stats and dropout is off.fc allows variable H/W in many setups; still use consistent preprocessing and validate shape through backbone.