CNN Basics for Vision MCQ

Convolutional neural networks for images and the AlexNet breakthrough on ImageNet.

Easy: 0 Q Medium: 0 Q Hard: 0 Q

Your Score

0/0

Keep practicing to improve your Computer Vision knowledge!

0 Correct 0 Incorrect

CNNs for Vision MCQ

Convolutional networks for images

CNNs apply learned filters locally across the spatial grid, sharing parameters across locations (translation equivariance). Stacked conv layers build hierarchical features; pooling and stride reduce resolution; normalization and skip connections appear in deeper designs used for detection and segmentation.

Parameter sharing

One conv kernel is reused at every spatial position—far fewer parameters than a fully connected layer on the full image.

Key ideas

Convolution

Sliding inner product: output channels mix local neighborhoods of input channels.

Pooling

Max or average pool reduces spatial size and adds local translation tolerance.

Stride & padding

Stride > 1 downsamples; padding preserves spatial size or aligns dimensions.

Receptive field

Region in the input that can influence one output neuron—grows with depth.

Typical CNN stack

Conv → activation → pool → … → global pool / FC → task head

Pro tip: 1×1 convolutions mix channels without changing spatial size—used heavily in Inception-style and bottleneck blocks.

AlexNet MCQ

AlexNet in context

AlexNet (Krizhevsky et al., 2012) won ImageNet ILSVRC with a large GPU-trained CNN. It popularized ReLU activations, dropout regularization, overlapping max pooling, data augmentation, and multi-GPU model parallelism for vision. Deeper stacks of conv layers followed (VGG, ResNet, …).

Why it mattered

It showed that deep CNNs scaled with data and compute could dominate hand-crafted features on a hard benchmark.

Key ideas

Architecture

Five conv layers (with LRN and pool stages) then three FC layers.

ReLU

Faster training than saturating sigmoids/tanh; helps deep nets converge.

Dropout

Randomly drops activations in FC layers to reduce co-adaptation / overfitting.

Scale

Trained on two GPUs with split conv layers—enabled larger width.

Rough data flow

227×227 input → conv/pool stages → 4096-4096-1000 FC → softmax

Pro tip: Local Response Normalization (LRN) was used in AlexNet but later often replaced by batch norm in modern designs.

Previous: Visual SLAM MCQ Next: Advanced CNN Architectures MCQ