Deep Learning Loss Functions: 20 Interview Questions

Question 1

1 What is a loss function in deep learning? ⚡ Easy

Answer

Answer: A loss function (cost/objective) quantifies the error between model predictions and true targets. Training minimizes this loss via gradient descent. Choice of loss depends on task: regression (L1, L2), classification (cross-entropy), ranking (hinge), etc.

Question 2

2 Compare MSE and MAE. When to use each? 📊 Medium

Answer

Answer: MSE = mean( (y-ŷ)² ), MAE = mean( |y-ŷ| ). MSE penalizes large errors more (squared), sensitive to outliers. MAE is robust to outliers. Use MSE when outliers are rare/need to be emphasized; MAE when robustness is needed. MSE gradient magnitude ∝ error, MAE gradient constant (±1).

Question 3

3 Why use cross-entropy for classification, not MSE? 🔥 Hard

Answer

Answer: Cross-entropy with softmax/sigmoid gives stronger gradients when prediction is wrong. MSE + sigmoid saturates quickly – vanishing gradient. CE is also probabilistic (minimizes KL divergence), directly optimizes log-likelihood. CE is convex in parameters for linear models.

Question 4

4 Binary vs Categorical Cross-Entropy: difference? ⚡ Easy

Answer

Answer: Binary CE for 2 classes (single sigmoid output). Categorical CE for ≥3 classes (softmax output). For multi-label (multiple binary tasks), use binary CE per output.

Question 5

5 What is Hinge loss? Where is it used? 📊 Medium

Answer

Answer: Hinge: max(0, 1 - y·ŷ) for y ∈ {-1,1}. Used in SVMs and max-margin classifiers. Encourages correct classification with a margin. Not differentiable at margin; subgradient used. Less common in deep nets but used in Siamese nets (contrastive hinge).

Question 6

6 Explain Huber loss. When is it useful? 🔥 Hard

Answer

Answer: Huber loss = MSE for small error, MAE for large error (quadratic near zero, linear otherwise). Smooth, less sensitive to outliers than MSE, differentiable. Used in robust regression (e.g., object detection bounding boxes – Smooth L1 is similar).

Question 7

7 KL Divergence vs Cross-Entropy: relation? 🔥 Hard

Answer

Answer: Cross-Entropy = H(p,q) = H(p) + KL(p||q). Minimizing cross-entropy is equivalent to minimizing KL divergence if p is fixed (target distribution). In VAEs, we minimize KL(q(z|x) || p(z)) to regularize latent space.

Question 8

8 What are Contrastive and Triplet losses? 🔥 Hard

Answer

Answer: Contrastive: pulls positive pairs together, pushes negative apart (margin). Triplet: anchor, positive, negative; loss = max(0, d(a,p) - d(a,n) + margin). Used in face recognition (FaceNet), siamese networks, self-supervised learning (SimCLR).

Question 9

9 What is Focal Loss? Where is it used? 🔥 Hard

Answer

Answer: Focal loss = -(1-p_t)^γ * log(p_t). Modifies cross-entropy to down-weight easy examples, focus on hard misclassified. Solves class imbalance in object detection (RetinaNet). γ=2 common.

Question 10

10 What is CTC loss? Why is it useful? 🔥 Hard

Answer

Answer: Connectionist Temporal Classification (CTC) aligns input sequences to output sequences without pre-alignment. Used in speech recognition, OCR. It sums probabilities over all possible alignments via dynamic programming.

Question 11

11 Heuristics: choose L1, L2, or Huber for regression? 📊 Medium

Answer

Answer: L2 (MSE): default, but outlier-sensitive. L1 (MAE): robust, but slower convergence. Huber: best of both – quadratic for small errors, linear for large. Smooth L1 used in detectors.

Question 12

12 Why is cross-entropy always ≥ 0? 📊 Medium

Answer

Answer: Cross-entropy = -Σ p(x) log q(x). Since p(x) ≤ 1 and log q(x) ≤ 0 (q(x) ≤ 1), product is negative; with minus sign becomes non-negative. Zero only if predictions exactly match targets.

Question 13

13 Relation between perplexity and cross-entropy? 📊 Medium

Answer

Answer: Perplexity = 2^{H(p,q)} where H is cross-entropy (if using log base 2). It measures how "surprised" the model is. Lower perplexity = better language model.

Question 14

14 NLL vs Cross-Entropy – same? ⚡ Easy

Answer

Answer: For classification with one-hot targets, categorical cross-entropy = negative log-likelihood. NLL is just -log(p(y|x)). In PyTorch, `CrossEntropyLoss` = LogSoftmax + NLLLoss.

Question 15

15 What is Dice loss? Where is it used? 🔥 Hard

Answer

Answer: Dice = 1 - (2|X∩Y|)/(|X|+|Y|). Differentiable approximation of IoU. Used in medical image segmentation, imbalanced data. Handles pixel-wise class imbalance well.

Question 16

16 Why use log in cross-entropy loss? 📊 Medium

Answer

Answer: Log converts multiplicative probabilities to additive; numerically stable. Also, maximizing likelihood = minimizing negative log-likelihood. Log loss heavily penalizes very wrong confident predictions.

Question 17

17 Compare gradients of MSE and MAE. 📊 Medium

Answer

MSE gradient scales with error; MAE gradient magnitude constant ±1. MSE converges faster but outlier-sensitive.

Question 18

18 Loss function for ordinal regression? 🔥 Hard

Answer

Answer: CORAL loss (Cumulative link model) or square of difference in thresholds. Alternatively, treat as regression with rounding, or use MSE/MAE if scale meaningful.

Question 19

19 What is energy-based loss? 🔥 Hard

Answer

Answer: Energy-based models (EBM) assign scalar energy to configurations. Loss designed to push down energy of correct answer, pull up incorrect. Example: contrastive loss, hinge loss for EBM.

Question 20

20 Designing a custom loss: key requirements? 🔥 Hard

Answer

Answer: Must be differentiable (almost everywhere), should correlate with evaluation metric, numerically stable, efficient. Also consider convexity (not strictly required) and gradient behavior.

Deep Learning Loss Functions: 20 Interview Questions

Loss Functions – Interview Cheat Sheet

Regression

Classification

Advanced

Outlier robust