Neural Networks
15 Essential Q&A
Interview Prep
PyTorch for Neural Networks — 15 Interview Questions
Tensors, requires_grad, nn.Module, optimizers, GPU moves, and the standard train loop pattern.
Colored left borders per card; green / amber / red difficulty chips.
Tensor
Autograd
Module
CUDA
1 Tensor vs NumPy array.Easy
Answer: PyTorch tensors support GPU, autograd, and DL ops; can bridge with
.numpy() / torch.from_numpy (share memory when possible on CPU).2 What does
requires_grad=True mean?EasyAnswer: Track operations on this tensor to build a graph for
.backward()—needed for parameters and sometimes inputs (meta-learning).3
nn.Module—what must you implement?EasyAnswer:
forward(self, x) defines computation; parameters registered as nn.Parameter or child modules—never call forward hooks manually; use model(x).4
nn.Sequential vs subclassing Module.MediumAnswer: Sequential chains layers in order—simple. Subclass when you need branching, conditionals, or multi-input forward.
5 Training step skeleton.Easy
Answer:
optimizer.zero_grad() → forward → loss → loss.backward() → optimizer.step(). Zero grad clears previous iteration’s .grad.6 DataLoader purpose.Easy
Answer: Batches samples, optional shuffle, num_workers for parallel loading, pin_memory for faster GPU transfer.
7 Move model and tensors to GPU.Easy
Answer:
device = torch.device("cuda"); model.to(device); batch tensors .to(device)—all operands must be on same device.8
model.eval() and torch.no_grad().MediumAnswer:
eval() switches BN/Dropout to inference behavior. no_grad() disables autograd for inference—saves memory and compute.9
detach() vs item().MediumAnswer:
detach() breaks gradient graph (still tensor). item() pulls Python scalar from single-element tensor—no grad.10
CrossEntropyLoss vs BCEWithLogitsLoss.MediumAnswer: CE expects class indices + raw logits (softmax inside loss). BCEWithLogits is per-element sigmoid + BCE for multi-label or binary.
11 Save/load checkpoints.Easy
Answer:
torch.save({"model": model.state_dict(), "opt": opt.state_dict()}, path); load with load_state_dict—state_dict has weights only, not architecture.12
torch.compile (high level).HardAnswer: JIT-style graph capture and optimization (Inductor)—can speed training/inference; may need fallbacks for dynamic shapes.
13 Automatic Mixed Precision (AMP).Medium
Answer: Run most forward/back in float16 with
autocast, GradScaler for stable grads—faster on Tensor Cores.14 Custom
autograd.Function—when?HardAnswer: Need a new op with explicit forward/backward; rare in app code—use when no built-in op fits.
15 PyTorch vs TensorFlow (eager) one line.Easy
Answer: Both default eager now; PyTorch historically more Pythonic for research; TF strong in production tooling (TF Serving, TFLite)—convergence in practice.
Say
zero_grad before backward—classic trick question.Quick review checklist
- Tensor, requires_grad, backward, zero_grad.
- nn.Module, Sequential, device, eval/no_grad.
- DataLoader; CE vs BCE logits; state_dict; AMP sketch.