Interview Q&A40 Questions
GANs & Autoencoders — Interview Q&A
Generative adversarial networks and autoencoder architectures for generation and compression.
GANs: 20 Interview Questions
1
What is a Generative Adversarial Network (GAN)? Explain the core idea.
âš¡ Easy
Answer: GANs consist of two networks: a generator (G) that creates fake data from noise, and a discriminator (D) that tries to distinguish real from fake. They play a minimax game: G tries to fool D, D tries to not be fooled. The equilibrium is when G produces realistic data (D=0.5).
min_G max_D V(D,G) = E_x[log D(x)] + E_z[log(1 - D(G(z)))]
2
Describe the roles of generator and discriminator in detail.
📊 Medium
Answer: Generator maps latent vector z (random noise) to data space, trying to produce realistic samples. Discriminator is a binary classifier that outputs probability of input being real. They are trained alternately: D on real + fake, G to maximize D's error.
3
What is the difference between minimax loss and non-saturating loss in GANs?
🔥 Hard
Answer: Minimax: G minimizes log(1-D(G(z))) → early vanishing gradient. Non-saturating: G maximizes log(D(G(z))) → stronger gradients early. Modern GANs use non-saturating loss with improved training.
L_G = -E_z[log(D(G(z)))] (non-saturating)
4
What is mode collapse in GANs? Why does it happen?
🔥 Hard
Answer: Mode collapse occurs when generator produces limited varieties (only few modes of data distribution). It happens when G finds a few "tricks" that fool D and over-optimizes them, failing to explore full distribution.
Solutions: WGAN, minibatch discrimination, unrolled GANs
Collapsed G: repetitive samples
5
What are the main contributions of DCGAN?
📊 Medium
Answer: DCGAN (Deep Convolutional GAN) introduced: 1) Replace pooling with strided convolutions (D) / fractional-strided (G). 2) BatchNorm in both G and D. 3) No fully connected layers. 4) ReLU in G (except output tanh), LeakyReLU in D. Stabilized training.
6
How does WGAN improve GAN training?
🔥 Hard
Answer: WGAN replaces JSD with Earth-Mover (Wasserstein) distance, which is continuous and provides meaningful gradients even when D is perfect. Uses weight clipping (later gradient penalty WGAN-GP) for Lipschitz constraint. Solves mode collapse and training instability.
V(G,D) = E_x[D(x)] - E_z[D(G(z))]; Lipschitz constraint via gradient penalty
7
What is a conditional GAN? Where is it used?
📊 Medium
Answer: cGAN feeds additional condition (class label, text, image) to both generator and discriminator. Enables controlled generation. Applications: Pix2Pix, text-to-image synthesis, semantic segmentation.
min_G max_D V(D,G) = E_x[log D(x|y)] + E_z[log(1 - D(G(z|y)))]
8
How does CycleGAN perform unpaired image translation?
🔥 Hard
Answer: CycleGAN uses two generators (G: X→Y, F: Y→X) and two discriminators. Key: cycle-consistency loss – translating X→Y→X should return original. No paired data needed. Also identity loss to preserve color.
L_cyc = E_x[||F(G(x))-x||] + E_y[||G(F(y))-y||]
9
What is latent space in GANs? Why interpolation is smooth?
📊 Medium
Answer: Latent space (z) is low-dimensional input to generator, typically Gaussian. G learns to map continuous z to realistic images; interpolating between z vectors yields semantically smooth transitions, showing G has learned meaningful representations.
10
What is unique about StyleGAN architecture?
🔥 Hard
Answer: StyleGAN removes input latent vector; instead uses mapping network to intermediate latent space w, then AdaIN (adaptive instance normalization) controls style at each layer. Also adds noise for stochastic variations. Enables disentangled control (coarse/fine styles).
11
How are GANs evaluated? Explain FID and Inception Score.
🔥 Hard
Answer: Inception Score (IS): uses pretrained InceptionNet; measures image quality and diversity (high score if confident class predictions & varied labels). Frechet Inception Distance (FID): computes Wasserstein-2 distance between real & fake feature distributions; lower is better, more robust than IS.
12
Why do GANs suffer from vanishing gradients?
📊 Medium
Answer: When D becomes too strong (perfectly classifies), log(1-D(G(z))) saturates to 0, giving G almost no gradient. Solutions: non-saturating loss, WGAN (critic scores not probabilities), label smoothing, or making D weaker.
13
What is one-sided label smoothing? Why only for real labels?
📊 Medium
Answer: Replace real labels (1) with soft values like 0.9. Prevents D from becoming overconfident, providing smoother gradients. Only smooth real labels; smoothing fake labels (0→0.1) encourages D to push G samples away, harming training.
14
Compare GANs and VAEs.
📊 Medium
GANs: Adversarial training, sharp realistic images, no explicit likelihood, prone to mode collapse, harder to train.
VAEs: Variational lower bound, maximizes likelihood, covers all modes (but blurry), stable training, latent space structured.
VAEs: Variational lower bound, maximizes likelihood, covers all modes (but blurry), stable training, latent space structured.
15
Why is weight clipping problematic in WGAN? How is it fixed?
🔥 Hard
Answer: Weight clipping forces critic to lie in narrow space, leading to capacity underuse and exploding/vanishing gradients. WGAN-GP replaces it with gradient penalty: penalize if gradient norm deviates from 1 (Lipschitz constraint).
gp = lambda * ((grad_norm - 1)**2).mean()
16
What is spectral normalization in GANs?
🔥 Hard
Answer: Normalizes weights by their largest singular value, enforcing Lipschitz constraint (spectral norm = 1). Used in SNGAN; stabilizes training without heavy hyperparameter tuning. Works well for both G and D.
17
Why introduce attention in GANs?
📊 Medium
Answer: SAGAN uses self-attention to model long-range dependencies (global features) instead of only local convolutions. Improves image quality in complex scenes (e.g., ImageNet) by capturing relationships between distant regions.
18
Explain feature matching technique in GANs.
🔥 Hard
Answer: G is trained to match the expected features (intermediate activations) of real data from D, not just final D output. Minimizes L2 distance between real/fake feature means. Helps prevent overtraining on current D.
19
What is progressive growing in GANs?
🔥 Hard
Answer: Start training with low-resolution images, gradually add layers to increase resolution. Stabilizes high-resolution GAN training (e.g., 1024x1024). Both G and D grow simultaneously. Used in StyleGAN, ProGAN.
20
What is Nash equilibrium in context of GANs? Do we achieve it?
🔥 Hard
Answer: Nash equilibrium: D is optimal (cannot distinguish real/fake) and G is optimal (data distribution = real distribution). In practice, GANs oscillate and rarely converge to exact equilibrium; we aim for approximate Nash. Techniques like consensus optimization try to find stable points.
Autoencoders: 20 Interview Questions
21
What is an autoencoder? Basic architecture.
âš¡ Easy
Answer: An autoencoder is an unsupervised neural network that learns to copy its input to output via a bottleneck (latent) layer. Architecture: Encoder compresses input to latent code; Decoder reconstructs from latent code. Trained with reconstruction loss (e.g., MSE).
z = f_enc(x); ŷ = f_dec(z); L = ||x - ŷ||²
22
What is undercomplete vs overcomplete autoencoder?
📊 Medium
Answer: Undercomplete: bottleneck dimension less than input dimension. Forces compression, learns useful features. Overcomplete: bottleneck dimension larger than input. Risks learning identity function; requires regularization (sparse, denoising).
Undercomplete: compression, meaningful features
Overcomplete: needs regularization
23
Common reconstruction loss functions?
âš¡ Easy
Answer: MSE for continuous values, binary cross-entropy for pixel values in [0,1] (treating as probabilities), L1 loss for robustness.
24
Real-world applications of autoencoders?
📊 Medium
Answer: Dimensionality reduction (PCA nonlinear), anomaly detection (high reconstruction error), denoising, inpainting, compression, generative modeling (VAE), feature extraction for pretraining.
25
How does a denoising autoencoder (DAE) work?
📊 Medium
Answer: Input is corrupted (e.g., add noise, mask), model learns to reconstruct clean original. Forces learning robust features, not just copying. Corrupting process: x̃ = x + ε, ε ~ N(0, σ²).
L = ||x - f_dec(f_enc(x̃))||²
26
What is a sparse autoencoder? How enforce sparsity?
🔥 Hard
Answer: Enforces that most latent units are inactive. Sparsity penalty added to loss: KL divergence between average activation and target (e.g., Ï=0.05). Also L1 regularization on activations. Encourages specialized feature detectors.
L = reconstruction + β Σ KL(Ï || ÏÌ‚_j)
27
Explain contractive autoencoder. How different from denoising?
🔥 Hard
Answer: CAE adds penalty: Frobenius norm of Jacobian of encoder ∂z/∂x. Encourages robustness to small input changes by making latent representation contractive. DAE corrupts input; CAE regularizes gradient.
L = reconstruction + λ ||∂f_enc(x)/∂x||²_F
28
What is a variational autoencoder? Probabilistic perspective.
🔥 Hard
Answer: VAE is a generative model that learns latent distribution p(z|x). Encoder outputs parameters of Gaussian (μ, σ). Decoder generates data from sampled z. Trained with ELBO: reconstruction + KL divergence to prior p(z)=N(0,I). Enables interpolation and generation.
ELBO: E_q[log p(x|z)] - KL(q(z|x) || p(z))
29
Why reparameterization trick in VAE?
🔥 Hard
Answer: Sampling from N(μ, σ) is non-differentiable. Trick: sample ε ~ N(0,1), then z = μ + σ·ε. Gradient flows through μ,σ, enabling backprop.
z = mu + sigma * torch.randn_like(sigma)
30
VAE vs standard autoencoder: key differences?
📊 Medium
Answer: AE learns deterministic latent code; VAE learns probabilistic latent distribution. AE latent space not continuous; VAE enforces smoothness via KL. VAE is generative; AE is reconstructive.
31
What is β-VAE?
🔥 Hard
Answer: VAE variant with β > 1 multiplier on KL term. Encourages more disentangled latent representations. Trade-off: reconstruction vs. independence.
L = reconstruction + β·KL
32
How to use autoencoders for anomaly detection?
📊 Medium
Answer: Train on normal data only. Anomalies yield high reconstruction error. Set threshold based on validation. Used in fraud, industrial defect detection.
33
When to use convolutional autoencoder?
âš¡ Easy
Answer: For image data. Encoder uses Conv+Pooling, decoder uses Transposed Conv/Upsampling. Preserves spatial structure.
34
What is stacked autoencoder? Greedy layer-wise pretraining?
🔥 Hard
Answer: Multiple autoencoders stacked; each layer trained separately to reconstruct previous layer's output. Used for deep network pretraining (historical). End-to-end fine-tuning.
35
List ways to regularize autoencoders.
📊 Medium
Answer: Sparse penalty (KL, L1), denoising (corrupt input), contractive (Jacobian), variational (KL to prior), dropout.
36
Linear autoencoder with MSE – relation to PCA?
🔥 Hard
Answer: Linear autoencoder (no nonlinearity) with MSE learns the same principal subspace as PCA. Weights span the same space, but not necessarily orthogonal.
37
What is posterior collapse in VAE?
🔥 Hard
Answer: Decoder ignores latent z, and q(z|x) matches prior p(z). Happens with strong decoders (e.g., autoregressive). Solutions: KL annealing, free bits, β-VAE, reducing decoder capacity.
38
What is adversarial autoencoder?
🔥 Hard
Answer: AAE uses adversarial training to match aggregated posterior of latent code to prior. Discriminator distinguishes true prior samples vs encoder output. Enables more flexible priors.
39
How to visualize autoencoder learned features?
📊 Medium
Answer: For images, visualize decoder weights or generate from latent traversal. For latent space, t-SNE/UMAP on z. For VAE, interpolate between z1 and z2.
40
Can autoencoders handle missing data?
🔥 Hard
Answer: Yes: train denoising autoencoder with random masking. Model learns to impute missing values. Also partial autoencoders. VAE can model conditional distribution.