CV MCQ — Chapter 16 0 Questions
Generative Vision Models

Generative Vision Models MCQ

Autoencoders, GANs, and diffusion models for image generation and reconstruction.

Easy: 0 Q Medium: 0 Q Hard: 0 Q

Autoencoders MCQ

What is an autoencoder?

An autoencoder maps input x to a latent code z via an encoder and reconstructs x̂ with a decoder. Training minimizes reconstruction error (often L2 or L1). A narrow bottleneck forces compression; denoising autoencoders learn robust features by reconstructing clean data from corrupted inputs.

Information bottleneck

The latent must capture salient factors of variation if reconstruction is accurate with few dimensions.

Key ideas

Encoder

CNN or MLP downsampling x → z.

Decoder

Upsampling or transposed conv z → x̂ matching input shape.

Reconstruction loss

Pixel-wise MSE or BCE for images.

Denoising AE

Train on noisy input, target clean output.

Forward pass

x → encoder → z → decoder → x̂; backprop through reconstruction loss

Pro tip: VAEs add a probabilistic latent and KL term; vanilla AEs have deterministic codes.

GANs Introduction MCQ

Generative adversarial networks

GANs train a generator G(z) to fool a discriminator D(x) that classifies real vs fake. The minimax objective leads to Nash equilibrium where samples match the data distribution at optimality (idealized). Practice requires balancing learning rates, architectures, and regularizers; mode collapse remains a classic failure mode.

The two-player game

D maximizes correct classification; G minimizes log(1−D(G(z))) or related objectives so fakes look real.

Key ideas

Generator

Maps noise z to samples in data space.

Discriminator

Scores real data high and fakes low.

Non-saturating G loss

Common reformulation so G gets stronger gradients early.

Mode collapse

G outputs limited variety; detectors see repeated patterns.

Training loop

Sample z → G(z) → update D on real/fake → update G to fool D

Pro tip: WGAN-GP, spectral norm, and progressive growing addressed many stability issues historically.

Diffusion Models MCQ

Diffusion for generation

Diffusion models define a forward Markov process that adds Gaussian noise until data becomes pure noise, then learn a reverse process (neural denoiser) to sample new data. DDPM and score-based formulations connect denoising to learning gradients of log density. Modern text-to-image systems build on these ideas with large U-Nets and conditioning.

Why denoise step-by-step

Iterative refinement from noise allows modeling complex high-dimensional distributions more stably than single-shot generators.

Key ideas

Forward process

q(x_t|x_{t-1}) adds noise per timestep.

Reverse model

pθ(x_{t-1}|x_t) learns to remove noise.

Schedule

β_t controls how much noise per step across T.

Conditioning

Text or class embeddings guide the denoiser (classifier-free guidance).

Sampling

Start from Gaussian noise → T reverse steps with learned denoiser → image

Pro tip: DDIM and consistency models reduce step count for faster sampling.