CNNs for Vision MCQ
Local filters, stacked hierarchies, and why convolutions beat dense layers on images.
Conv
Local filters
Pool
Downsample
Stride
Step size
Receptive field
Context
Convolutional networks for images
CNNs apply learned filters locally across the spatial grid, sharing parameters across locations (translation equivariance). Stacked conv layers build hierarchical features; pooling and stride reduce resolution; normalization and skip connections appear in deeper designs used for detection and segmentation.
Parameter sharing
One conv kernel is reused at every spatial position—far fewer parameters than a fully connected layer on the full image.
Key ideas
Convolution
Sliding inner product: output channels mix local neighborhoods of input channels.
Pooling
Max or average pool reduces spatial size and adds local translation tolerance.
Stride & padding
Stride > 1 downsamples; padding preserves spatial size or aligns dimensions.
Receptive field
Region in the input that can influence one output neuron—grows with depth.
Typical CNN stack
Conv → activation → pool → … → global pool / FC → task head