NN Programming Practice
Every item on this page is a coding task—implement it in Python (prefer PyTorch; use NumPy where noted). Use a notebook or .py file, print tensor shapes, and assert expected behavior. For theory-only review, use the MCQ and interview Q&A pages. Match each block to the NN tutorial when you need context.
PyTorch training loops tensor shapes
Neural Networks — Topic-wise Programming Practice
Each block aligns with the NN tutorial sidebar. Complete the code tasks in order; commit snippets to a repo or notebook so you can reuse them as templates.
Introduction & Perceptron
Review: What are NNs? · Perceptron
Write a function forward(x, w, b, activation) where x is shape (d,), w shape (d,), b scalar, and activation is "sigmoid" or "relu" on the pre-activation z = w @ x + b. Unit-test on a fixed random seed against torch.sigmoid / F.relu.
Implement the perceptron update rule in NumPy for logical AND on inputs in {0,1}². Loop until all points classified; print w, b each epoch. Then try XOR with a single layer and verify it never converges (log MSE or error count).
MLP & Activation Functions
Review: MLP · Activations
nn.Sequential MLP for flat vectorsBuild torch.nn.Sequential with Linear(784, 256), ReLU, Linear(256, 128), ReLU, Linear(128, 10). Pass a batch x of shape (32, 784); print output shape. Count parameters with sum(p.numel() for p in model.parameters()).
Vectorized NumPy: implement sigmoid, relu, and softmax (stable: subtract row max). Compare max absolute difference to torch on a random (4, 5) tensor (convert with torch.from_numpy).
Forward Propagation & Loss
Review: Forward propagation · Loss functions
Register a forward_hook on each Linear (or print inside a custom forward) for a small MLP with batch B=16, input dim 20. Log tensor shape after each layer. Repeat with B=1 and confirm batch dimension behavior.
MSELoss vs CrossEntropyLoss vs BCEWithLogitsLossCreate toy tensors: regression targets (B, 1), multi-class logits (B, C) with integer labels (B,), binary logits (B, 1) with float labels (B, 1). Instantiate the three losses and call .forward; print scalar losses. Trigger a deliberate shape/dtype error and fix it.
Gradient Descent & Backpropagation
Review: Gradient descent · Backpropagation
Minimal example: model, optimizer = torch.optim.SGD(model.parameters(), lr=0.1), criterion, one batch x, y. Implement optimizer.zero_grad(), loss = criterion(model(x), y), loss.backward(), optimizer.step(). Assert loss.item() is finite; print model.layer.weight.grad.norm() before/after zero_grad.
Forward a batch where the pre-ReLU values are all negative. After backward(), show that gradients for that layer’s input are zero. Compare with a batch that has mixed positive/negative pre-activations.
Computational Graphs
Review: Computational graphs
L = (a * b + c)²Create a, b, c as torch.tensor(..., requires_grad=True) with values 2, 3, 1. Compute L, call L.backward(), print a.grad, b.grad, c.grad. Derive the chain rule on paper and assert they match (expected: 42, 28, 14 for a, b, c at those values).
torch.autograd.grad vs backwardSame graph: use torch.autograd.grad(L, [a, b, c], retain_graph=True) and confirm gradients match .backward(). Zero grads and repeat with create_graph=True on a simpler L = a**2 and compute second derivative w.r.t. a.
Design, Initialization & Batch Norm
Review: Network design · Weight init · Batch norm
Write two nn.Sequential MLPs on the same synthetic regression task (e.g. y = sin(xâ‚€)+0.1*noise) with roughly equal parameter count: one wider/shallower, one narrower/deeper. Train both for fixed epochs; log final train MSE in a table.
xavier_uniform_ vs kaiming_uniform_ + BatchNorm1dClone the same 3-layer MLP twice; apply nn.init.xavier_uniform_ on one and kaiming_uniform_ on the other. Print pre-activation std after first forward. Add nn.BatchNorm1d after hidden layers; demonstrate model.train() vs model.eval() output difference on a batch of size 1.
Overfitting & Dropout
Review: Overfitting · Dropout
Use ≤50 samples from MNIST or random synthetic data; build a large MLP; train until train accuracy ≈100%. Record validation accuracy each epoch in a Python list and print the gap. No plotting required—just numbers.
nn.Dropout train vs evalFix input x; forward the same x ten times in train() with p=0.5 and show output variance. Switch to eval() and show outputs are identical across runs. Optionally compare to manual F.dropout(x, p=0.5, training=True).
Optimizers, Learning Rate & Vanishing Gradients
Review: Optimizers · Learning rate · Vanishing / exploding
Train identical model/data for 20 epochs twice: torch.optim.SGD(..., momentum=0.9) vs torch.optim.Adam. Log loss each epoch; print final weights’ L2 norm for both runs. Use the same seed and dataloader order.
StepLR or CosineAnnealingLRWrap your optimizer in torch.optim.lr_scheduler.StepLR (or cosine). Print scheduler.get_last_lr() every epoch. Build a 5-layer MLP with Tanh and show vanishing first-layer .grad.norm(); repeat after swapping hidden activations to ReLU.
CNN, RNN, Attention & Transfer Learning
Review: CNN · RNN · Attention · Transfer learning
Conv2d output shape in PyTorchx = torch.randn(8, 3, 32, 32); stack Conv2d(3,64,kernel_size=3,padding=1), ReLU, MaxPool2d(2). Print y.shape after each layer. Repeat with stride=2 conv instead of pool and compare shapes.
LSTM + MultiheadAttention toy batchTensor (B, T, D) = (4, 10, 32): run through nn.LSTM(D, H, batch_first=True) and print last hidden shape. Same tensor: treat as sequence length 10, use nn.MultiheadAttention(embed_dim=32, num_heads=4, batch_first=True) with self-attn (query=key=value); print output shape.
Load torchvision.models.resnet18(weights=DEFAULT); replace fc for 5 classes; freeze all parameters except fc. Run one optimizer step and assert only fc.weight.grad is non-None (others None or zero as expected).
Evaluation Metrics & Frameworks
Review: Metrics · PyTorch · TensorFlow / Keras
Given random logits (N, C) and labels (N,), compute accuracy with argmax. For binary logits (N, 1), compute precision/recall at threshold 0 using vectorized boolean masks (no sklearn required). Compare your numbers to sklearn.metrics if installed.
Implement the same 2-hidden-layer classifier in PyTorch (explicit training loop with zero_grad) and in TensorFlow/Keras (model.compile(optimizer='adam', loss='sparse_categorical_crossentropy'), model.fit). Train 3 epochs on identical numpy X, y; print final loss from both (within reason if seeds differ).
Programming: Shapes, Loss APIs & Device
- Write a script:
x = torch.randn(32, 784);nn.Linear(784, 256)(x)→ print shape; chain second Linear to 10 classes. Assert final shape is(32, 10). - Programmatically verify Conv2d output: build layer
nn.Conv2d(3, 64, 3, padding=1), input(4, 3, 32, 32), print.shape; compare to formulafloor((W+2p-k)/s)+1for height/width. - Demonstrate in code:
CrossEntropyLosson logits + integer labels vsNLLLossonlog_softmax(dim=1)of the same logits—losses should match (within float tolerance). - Optional: move the same two-line model to
cudaif available; catch and print a clear error if tensors stay on CPU.
Programming: Debug Common Training Bugs
Fix or intentionally reproduce each bug in a minimal script, then correct it.
- Forgot
zero_grad: run two backward steps without zeroing—watch gradients explode or double; addoptimizer.zero_grad(set_to_none=True)and stabilize. - Wrong loss input: pass softmax probabilities into
CrossEntropyLoss(should be logits); switch to raw logits and confirm loss decreases. - Eval forgotten: run validation with
model.train()andDropouton—then callmodel.eval()andtorch.no_grad(); compare metric.
Weekly coding rhythm (example)
Mon: finish 2 topic-wise code cards + git commit
Wed: one small dataset + train/val split + log metrics dict
Fri: refactor into nn.Module + argparse or config dict
Weekend: ablation branch — e.g. with/without BatchNorm, same seed
Summary
- This page is programming-only: PyTorch/NumPy implementations, training steps, shape checks, and debug drills.
- Use the sidebar to open the matching tutorial when an API confuses you; use MCQ / interview pages for non-code review.
- Next: real-life examples of neural nets in production.
Close the series with industry use cases on the real-life examples page—still helpful for interviews after you can ship code.