Generative Vision Models MCQ
Autoencoders, GANs, and diffusion models for image generation and reconstruction.
Autoencoders MCQ
What is an autoencoder?
An autoencoder maps input x to a latent code z via an encoder and reconstructs x̂ with a decoder. Training minimizes reconstruction error (often L2 or L1). A narrow bottleneck forces compression; denoising autoencoders learn robust features by reconstructing clean data from corrupted inputs.
Information bottleneck
The latent must capture salient factors of variation if reconstruction is accurate with few dimensions.
Key ideas
Encoder
CNN or MLP downsampling x → z.
Decoder
Upsampling or transposed conv z → x̂ matching input shape.
Reconstruction loss
Pixel-wise MSE or BCE for images.
Denoising AE
Train on noisy input, target clean output.
Forward pass
x → encoder → z → decoder → x̂; backprop through reconstruction loss
GANs Introduction MCQ
Generative adversarial networks
GANs train a generator G(z) to fool a discriminator D(x) that classifies real vs fake. The minimax objective leads to Nash equilibrium where samples match the data distribution at optimality (idealized). Practice requires balancing learning rates, architectures, and regularizers; mode collapse remains a classic failure mode.
The two-player game
D maximizes correct classification; G minimizes log(1−D(G(z))) or related objectives so fakes look real.
Key ideas
Generator
Maps noise z to samples in data space.
Discriminator
Scores real data high and fakes low.
Non-saturating G loss
Common reformulation so G gets stronger gradients early.
Mode collapse
G outputs limited variety; detectors see repeated patterns.
Training loop
Sample z → G(z) → update D on real/fake → update G to fool D
Diffusion Models MCQ
Diffusion for generation
Diffusion models define a forward Markov process that adds Gaussian noise until data becomes pure noise, then learn a reverse process (neural denoiser) to sample new data. DDPM and score-based formulations connect denoising to learning gradients of log density. Modern text-to-image systems build on these ideas with large U-Nets and conditioning.
Why denoise step-by-step
Iterative refinement from noise allows modeling complex high-dimensional distributions more stably than single-shot generators.
Key ideas
Forward process
q(x_t|x_{t-1}) adds noise per timestep.
Reverse model
pθ(x_{t-1}|x_t) learns to remove noise.
Schedule
β_t controls how much noise per step across T.
Conditioning
Text or class embeddings guide the denoiser (classifier-free guidance).
Sampling
Start from Gaussian noise → T reverse steps with learned denoiser → image