PyTorch for Neural Networks â€” 15 Interview Questions

Tensors, requires_grad, nn.Module, optimizers, GPU moves, and the standard train loop pattern.

Colored left borders per card; green / amber / red difficulty chips.

Tensor Autograd Module CUDA

1 Tensor vs NumPy array.Easy

Answer: PyTorch tensors support GPU, autograd, and DL ops; can bridge with .numpy() / torch.from_numpy (share memory when possible on CPU).

2 What does requires_grad=True mean?Easy

Answer: Track operations on this tensor to build a graph for .backward()â€”needed for parameters and sometimes inputs (meta-learning).

3 nn.Moduleâ€”what must you implement?Easy

Answer: forward(self, x) defines computation; parameters registered as nn.Parameter or child modulesâ€”never call forward hooks manually; use model(x).

4 nn.Sequential vs subclassing Module.Medium

Answer: Sequential chains layers in orderâ€”simple. Subclass when you need branching, conditionals, or multi-input forward.

5 Training step skeleton.Easy

Answer: optimizer.zero_grad() â†’ forward â†’ loss â†’ loss.backward() â†’ optimizer.step(). Zero grad clears previous iterationâ€™s .grad.

6 DataLoader purpose.Easy

Answer: Batches samples, optional shuffle, num_workers for parallel loading, pin_memory for faster GPU transfer.

7 Move model and tensors to GPU.Easy

Answer: device = torch.device("cuda"); model.to(device); batch tensors .to(device)â€”all operands must be on same device.

8 model.eval() and torch.no_grad().Medium

Answer: eval() switches BN/Dropout to inference behavior. no_grad() disables autograd for inferenceâ€”saves memory and compute.

9 detach() vs item().Medium

Answer: detach() breaks gradient graph (still tensor). item() pulls Python scalar from single-element tensorâ€”no grad.

10 CrossEntropyLoss vs BCEWithLogitsLoss.Medium

Answer: CE expects class indices + raw logits (softmax inside loss). BCEWithLogits is per-element sigmoid + BCE for multi-label or binary.

11 Save/load checkpoints.Easy

Answer: torch.save({"model": model.state_dict(), "opt": opt.state_dict()}, path); load with load_state_dictâ€”state_dict has weights only, not architecture.

12 torch.compile (high level).Hard

Answer: JIT-style graph capture and optimization (Inductor)â€”can speed training/inference; may need fallbacks for dynamic shapes.

13 Automatic Mixed Precision (AMP).Medium

Answer: Run most forward/back in float16 with autocast, GradScaler for stable gradsâ€”faster on Tensor Cores.

14 Custom autograd.Functionâ€”when?Hard

Answer: Need a new op with explicit forward/backward; rare in app codeâ€”use when no built-in op fits.

15 PyTorch vs TensorFlow (eager) one line.Easy

Answer: Both default eager now; PyTorch historically more Pythonic for research; TF strong in production tooling (TF Serving, TFLite)â€”convergence in practice.

Say zero_grad before backwardâ€”classic trick question.

Quick review checklist

Tensor, requires_grad, backward, zero_grad.
nn.Module, Sequential, device, eval/no_grad.
DataLoader; CE vs BCE logits; state_dict; AMP sketch.

Previous: Evaluation metrics Next: TensorFlow