Related Neural Networks Links
Learn Cnn Neural Networks Tutorial, validate concepts with Cnn Neural Networks MCQ Questions, and prepare interviews through Cnn Neural Networks Interview Questions and Answers.
Neural Networks
15 Essential Q&A
Interview Prep
Convolutional Neural Networks — 15 Interview Questions
Local receptive fields, parameter sharing, 1×1 convs, depthwise separable convs, and why CNNs beat flat MLPs on images.
Colored left borders per card; green / amber / red difficulty chips.
Conv
Pool
RF
Channels
1 What is convolution in a CNN?Easy
Answer: Slide a small learned filter over the input (with optional padding/stride), computing dot products at each position—local connectivity + weight sharing across space.
2 Stride and output spatial size (1D intuition).Medium
Answer: Stride s subsamples outputs. Common formula (1D): out = floor((n + 2p − k)/s) + 1 for input length n, padding p, kernel k—layout conventions (NCHW) same idea in 2D per axis.
out = ⌊(n + 2p − k) / s⌋ + 1
3 Parameter count for conv layer.Medium
Answer: Per filter: k_h × k_w × C_in weights + bias (if used). With C_out filters: multiply by C_out. Far fewer than fully connecting all pixels to all hidden units.
4 What does a 1×1 convolution do?Medium
Answer: Mixes channels at each spatial location without blending neighbors—changes depth (bottleneck/expansion), adds nonlinearity stacks cheaply (Inception, ResNet bottlenecks).
5 Max pooling vs average pooling.Easy
Answer: Max: strongest local activation—sharp features. Avg: smoother downsampling—used in some network heads (e.g. Global Average Pooling). Both reduce spatial size.
6 Global average pooling (GAP).Medium
Answer: Average each channel over full H×W → vector length C—replaces large FC layers, reduces parameters and overfitting (ResNet-style classifiers).
7 Transposed convolution (deconv)—one line.Hard
Answer: Learned upsampling with learnable kernel—used in segmentation/decoders; can create checkerboard artifacts if not careful.
8 Dilated (atrous) convolution—why?Hard
Answer: Spaces kernel taps with holes to increase receptive field without more parameters or losing resolution—common in segmentation (DeepLab).
9 Depthwise separable convolution.Medium
Answer: Depthwise conv per channel + pointwise 1×1 to mix channels—much fewer FLOPs than standard conv; MobileNet family.
10 Translation equivariance—what does CNN get?Medium
Answer: Shift input → feature maps shift correspondingly (before pooling). Equivariant to translation; pooling adds approximate invariance locally.
11 CNN vs MLP on images—interview answer.Easy
Answer: CNN exploits locality and sharing—fewer parameters, better sample efficiency; MLP ignores spatial structure and scales poorly with resolution.
12 Receptive field—define.Easy
Answer: Region of input pixels that can affect one output activation—grows with depth, kernel size, dilation; shrinks effective field with stride/pooling.
13 What do deeper channels often represent?Medium
Answer: Hierarchical features: edges/textures low → parts → object-level abstractions (interpretation; not guaranteed per filter).
14 Data augmentation for CNNs—examples.Easy
Answer: Random crop/flip, color jitter, Cutout/RandAugment—reduces overfitting by simulating label-preserving transforms.
15 Name one classic and one modern CNN family.Easy
Answer: Classic: VGG / ResNet. Modern: EfficientNet, ConvNeXt, or ViT hybrid—shows you know field moved beyond plain conv stacks.
Be ready to sketch a conv → BN → ReLU → pool block.
Quick review checklist
- Conv, stride, padding; params per layer; 1×1 and depthwise sep.
- Pooling vs strided conv; GAP; receptive field.
- Equivariance; CNN vs MLP; augmentation.