Related Computer Vision Links
Learn Diffusion Computer Vision Tutorial, validate concepts with Diffusion Computer Vision MCQ Questions, and prepare interviews through Diffusion Computer Vision Interview Questions and Answers.
Computer Vision Interview
20 essential Q&A
Updated 2026
Diffusion
Diffusion Models: 20 Essential Q&A
Gradually destroy data with noise, then learn to reverse the process—state-of-the-art image and video generation.
~12 min read
20 questions
Advanced
forward processU-Netguidancelatent
Quick Navigation
1
What is a diffusion model?
⚡ easy
Answer: Generative model that learns to reverse a gradual noising process—start from Gaussian noise and denoise into a sample.
2
Forward process?
📊 medium
Answer: Fixed Markov chain adding Gaussian noise over T steps until data ≈ pure noise—q(x_t|x_{t−1}) with known variances.
3
Reverse process?
📊 medium
Answer: Learn p_θ(x_{t−1}|x_t) approximating true posterior—typically predict noise ε or x_0 with a neural net.
4
Training objective (DDPM)?
🔥 hard
Answer: Simplified ε-prediction MSE: network predicts noise added at each t—equivalent to variational lower bound with reweights.
5
Noise schedule β_t?
📊 medium
Answer: How fast variance grows with t—linear, cosine, etc.; affects training stability and sample quality.
# ε-prediction: target noise ε; pred = unet(x_t, t)
6
Why a U-Net?
📊 medium
Answer: Multi-scale spatial denoising with skip connections—preserves detail while aggregating context; time t injected via embeddings.
7
Sampling cost?
📊 medium
Answer: Autoregressive in time—hundreds/thousands of steps slow; accelerators (DDIM, distillation) reduce steps.
8
DDIM?
🔥 hard
Answer: Non-Markovian deterministic sampling sharing training objective—fewer steps with some quality tradeoff.
9
Classifier guidance?
🔥 hard
Answer: Use gradients from a classifier p(y|x_t) during sampling to steer generation—sharp but needs extra classifier.
10
Classifier-free guidance?
📊 medium
Answer: Train conditional and unconditional model together; interpolate scores at sample time—no separate classifier, widely used in SD.
11
Latent diffusion?
🔥 hard
Answer: Run diffusion in VAE latent space (lower res)—much cheaper; decode with VAE decoder (Stable Diffusion).
12
Stable Diffusion pieces?
📊 medium
Answer: CLIP text encoder, U-Net denoiser in latent space, VAE—plus schedulers and safety tooling around the stack.
13
vs GANs?
📊 medium
Answer: Diffusion: stable training, great diversity, slower sampling. GAN: fast one-shot but trickier mode coverage.
14
Video diffusion?
📊 medium
Answer: Add temporal layers or 3D convs; causal attention across frames—data and compute heavy.
15
Inpainting?
⚡ easy
Answer: Condition on known regions by concatenating mask/channel inputs to U-Net—fill missing areas consistently.
16
Text conditioning?
📊 medium
Answer: Cross-attention from text tokens to spatial features (like transformers)—T5/CLIP embeddings as context.
17
SNR weighting?
🔥 hard
Answer: Different timesteps contribute unequally to loss—reweighting (v-prediction, Min-SNR) improves quality.
18
Flow matching?
🔥 hard
Answer: Related generative path from noise to data via ODE/flows—competes with diffusion on speed and quality in recent work.
19
Compute / data?
⚡ easy
Answer: Large image-text pairs for T2I; training is GPU-heavy; inference optimizes with TensorRT, FlashAttention, distilled samplers.
20
Evaluation?
📊 medium
Answer: FID, CLIP score for text alignment, human preference studies—no single metric captures all.
Diffusion Cheat Sheet
Train
- Predict ε
Sample
- Reverse steps
- CFG
LDM
- VAE latent
💡 Pro tip: Forward fixed, reverse learned; mention CFG and latent diffusion.
Full tutorial track
Go deeper with the matching tutorial chapter and code examples.