Advanced CNN Architectures MCQ
ResNet skip connections, MobileNet depthwise separable convolutions, and EfficientNet scaling.
ResNet MCQ
Residual learning
ResNet replaces a stack of layers learning H(x) with residual blocks learning F(x) such that output is F(x)+x when shapes match. Identity shortcuts propagate gradients and ease optimization, enabling much deeper networks on ImageNet and detection backbones.
Why F(x)+x
If the desired mapping is close to identity, learning small perturbations F is easier than learning a full mapping from scratch.
Key ideas
Residual block
Two or three conv layers plus a shortcut summing with the input.
Projection shortcut
1×1 conv on x when channel or spatial sizes change.
Bottleneck
Reduces cost: narrow → 3×3 → narrow channels per block.
Batch norm
Stabilizes training of very deep stacks (used in original ResNet).
Deep stack
stem → stage of residual blocks → global pool → FC / detection head
MobileNet MCQ
Efficient CNNs
MobileNet factorizes a standard convolution into a depthwise spatial filter per input channel followed by a pointwise 1×1 that mixes channels. This cuts FLOPs and parameters dramatically. Width and input resolution multipliers offer accuracy–latency tradeoffs.
Depthwise separable
Cost roughly channels × k² + channels² vs channels_in × channels_out × k² for a k×k conv.
Key ideas
Depthwise
Each channel convolved independently with its own spatial kernel.
Pointwise
1×1 conv combines channels—linear mix at each pixel.
Width multiplier α
Uniformly thins channel counts across the network.
Resolution
Smaller input ρ reduces compute quadratically in spatial size.
MobileNet block
depthwise 3×3 → BN → ReLU → pointwise 1×1 → BN → ReLU