Neural Networks 15 Essential Q&A
Interview Prep

PyTorch for Neural Networks — 15 Interview Questions

Tensors, requires_grad, nn.Module, optimizers, GPU moves, and the standard train loop pattern.

Colored left borders per card; green / amber / red difficulty chips.

Tensor Autograd Module CUDA
1 Tensor vs NumPy array.Easy
Answer: PyTorch tensors support GPU, autograd, and DL ops; can bridge with .numpy() / torch.from_numpy (share memory when possible on CPU).
2 What does requires_grad=True mean?Easy
Answer: Track operations on this tensor to build a graph for .backward()—needed for parameters and sometimes inputs (meta-learning).
3 nn.Module—what must you implement?Easy
Answer: forward(self, x) defines computation; parameters registered as nn.Parameter or child modules—never call forward hooks manually; use model(x).
4 nn.Sequential vs subclassing Module.Medium
Answer: Sequential chains layers in order—simple. Subclass when you need branching, conditionals, or multi-input forward.
5 Training step skeleton.Easy
Answer: optimizer.zero_grad() → forward → loss → loss.backward() → optimizer.step(). Zero grad clears previous iteration’s .grad.
6 DataLoader purpose.Easy
Answer: Batches samples, optional shuffle, num_workers for parallel loading, pin_memory for faster GPU transfer.
7 Move model and tensors to GPU.Easy
Answer: device = torch.device("cuda"); model.to(device); batch tensors .to(device)—all operands must be on same device.
8 model.eval() and torch.no_grad().Medium
Answer: eval() switches BN/Dropout to inference behavior. no_grad() disables autograd for inference—saves memory and compute.
9 detach() vs item().Medium
Answer: detach() breaks gradient graph (still tensor). item() pulls Python scalar from single-element tensor—no grad.
10 CrossEntropyLoss vs BCEWithLogitsLoss.Medium
Answer: CE expects class indices + raw logits (softmax inside loss). BCEWithLogits is per-element sigmoid + BCE for multi-label or binary.
11 Save/load checkpoints.Easy
Answer: torch.save({"model": model.state_dict(), "opt": opt.state_dict()}, path); load with load_state_dict—state_dict has weights only, not architecture.
12 torch.compile (high level).Hard
Answer: JIT-style graph capture and optimization (Inductor)—can speed training/inference; may need fallbacks for dynamic shapes.
13 Automatic Mixed Precision (AMP).Medium
Answer: Run most forward/back in float16 with autocast, GradScaler for stable grads—faster on Tensor Cores.
14 Custom autograd.Function—when?Hard
Answer: Need a new op with explicit forward/backward; rare in app code—use when no built-in op fits.
15 PyTorch vs TensorFlow (eager) one line.Easy
Answer: Both default eager now; PyTorch historically more Pythonic for research; TF strong in production tooling (TF Serving, TFLite)—convergence in practice.
Say zero_grad before backward—classic trick question.

Quick review checklist

  • Tensor, requires_grad, backward, zero_grad.
  • nn.Module, Sequential, device, eval/no_grad.
  • DataLoader; CE vs BCE logits; state_dict; AMP sketch.