Activation functions deep dive 15 questions 25 min

Activation Functions MCQ · test your knowledge

From ReLU to Swish – 15 questions covering non‑linearity, vanishing gradient, output ranges, and modern variants.

Easy: 5 Medium: 6 Hard: 4

ReLU

Sigmoid

Tanh

Softmax

Your activation score

0/15

0 Correct0 Incorrect

Activation functions: the heart of neural networks

Activation functions introduce non‑linearity, allowing neural networks to approximate complex functions. This MCQ test covers classical and modern activation functions, their derivatives, output ranges, and practical considerations like vanishing gradient and dying ReLU.

Why non‑linearity?

Without non‑linear activation, stacked linear layers collapse into a single linear transformation, destroying the depth's representational power. Activations enable learning of complex, hierarchical features.

Activation glossary – key concepts

ReLU (Rectified Linear Unit)

f(x)=max(0,x) – most popular hidden layer activation. Computationally efficient, sparse, but can cause "dying ReLU".

Sigmoid

σ(x)=1/(1+e^-x) – outputs between 0 and 1. Used in binary classification output, but saturates and kills gradients.

Tanh

tanh(x) – zero‑centered, range (-1,1). Often preferred over sigmoid in hidden layers, but still saturates.

Softmax

Multi‑class output activation; converts logits to probabilities summing to 1.

Leaky ReLU / PReLU

Allow small negative slope (e.g., 0.01) to avoid dying ReLU. Parametric ReLU learns the slope.

ELU / SELU

Exponential Linear Unit – smooth negative part, can improve learning and normalise activations (SELU).

Swish / SiLU

swish(x)=x·σ(x) – discovered via search, often outperforms ReLU in deeper models.

# Common activation implementations (NumPy style)
def relu(x): return np.maximum(0, x)
def sigmoid(x): return 1/(1+np.exp(-x))
def tanh(x): return np.tanh(x)
def softmax(x): e = np.exp(x - x.max()); return e/e.sum()

Interview tip: Be ready to compare activation functions: ReLU vs Leaky ReLU, why sigmoid causes vanishing gradient, and when to use softmax. This MCQ covers these distinctions.

Common activation interview questions

Why is ReLU non‑linear if it looks like two linear pieces?
What is the "dying ReLU" problem and how can you fix it?
Why does sigmoid saturate and kill gradients?
Explain the output range of tanh and why zero‑centered activations help.
When would you use softmax versus sigmoid?
What are the advantages of Swish over ReLU?

AI Hub Next: Loss Functions