CNNs for Vision MCQ 15 Questions
Time: ~25 mins Intermediate

CNNs for Vision MCQ

Local filters, stacked hierarchies, and why convolutions beat dense layers on images.

Easy: 5 Q Medium: 6 Q Hard: 4 Q
Conv

Local filters

Pool

Downsample

Stride

Step size

Receptive field

Context

Convolutional networks for images

CNNs apply learned filters locally across the spatial grid, sharing parameters across locations (translation equivariance). Stacked conv layers build hierarchical features; pooling and stride reduce resolution; normalization and skip connections appear in deeper designs used for detection and segmentation.

Parameter sharing

One conv kernel is reused at every spatial position—far fewer parameters than a fully connected layer on the full image.

Key ideas

Convolution

Sliding inner product: output channels mix local neighborhoods of input channels.

Pooling

Max or average pool reduces spatial size and adds local translation tolerance.

Stride & padding

Stride > 1 downsamples; padding preserves spatial size or aligns dimensions.

Receptive field

Region in the input that can influence one output neuron—grows with depth.

Typical CNN stack

Conv → activation → pool → … → global pool / FC → task head

Pro tip: 1×1 convolutions mix channels without changing spatial size—used heavily in Inception-style and bottleneck blocks.