What Are Neural Networks? â€” 15 Interview Questions

Fifteen interview-style questions on definitions, architecture, training, and theoryâ€”each card uses a colored border so you can scan topics quickly before a screen or onsite.

Border colors group visual rhythm only; difficulty is shown on each question.

Architecture Forward pass Parameters Training

1 What is a neural network in machine learning? Easy

Answer: A neural network is a parametric model: layers of connected units (neurons) that transform inputs into outputs. Weights and biases are tuned on data (usually with gradients) so the network approximates a target mappingâ€”for classification, regression, or sequences.

2 What is a neuron? What is a layer? Easy

Answer: A neuron typically computes a weighted sum of inputs plus a bias, then applies an activation. A layer is a group of neurons at the same depth; input, hidden, and output layers stack to form the network.

z = wÂ·x + b â†’ a = activation(z)

3 What are parameters vs hyperparameters? Easy

Answer: Parameters (weights, biases) are learned during training. Hyperparameters are set before or outside the main gradient loop: learning rate, batch size, number of layers/units, dropout rate, epochs, choice of optimizer, etc.

4 Explain forward propagation in one sentence, then expand. Medium

Answer: Forward propagation passes input through each layer to produce a prediction and (typically) a loss. Expand: each layer applies affine transforms and activations; the final head matches the task (e.g. softmax logits for multi-class). No weight updates happen during pure inference forward passes.

5 Why do we need non-linear activation functions? Easy

Answer: Composition of linear maps is still linear. Non-linear activations let deep stacks express rich functions; this underpins universal approximation for sufficiently wide/deep networks with non-linearities.

6 What is the difference between training and inference? Easy

Answer: Training uses data to compute loss, backpropagate gradients, and update weights (may include dropout, augmentation). Inference uses fixed weights for prediction onlyâ€”often with eval mode for layers like batch norm and no gradient computation.

7 How does a neural network differ from classical ML models? Medium

Answer: Classical models (linear/logistic regression, shallow trees, SVMs with fixed kernels) often use hand-crafted features and limited composition. Deep NNs learn hierarchical representations from raw or weakly processed inputs; capacity scales with depth/width and data, at higher compute cost and risk of overfitting.

8 What roles do the loss function and optimizer play? Medium

Answer: The loss scores how wrong predictions are (e.g. cross-entropy for classification). The optimizer uses gradients of that loss to update parameters (SGD, Adam, etc.). Together they define what â€œbetterâ€ means and how the network moves toward it.

9 What is â€œdeep learningâ€ vs a shallow neural network? Easy

Answer: Shallow usually means few hidden layers (or none). Deep learning refers to models with many stacked layers that learn hierarchical features. Depth increases expressive power and data/compute needs; it is not a precise numeric cutoff in interviewsâ€”emphasize â€œmultiple hidden layers.â€

10 What is a tensor in neural network code? Easy

Answer: A tensor is a multi-dimensional array holding numerical dataâ€”scalars (0D), vectors (1D), matrices (2D), or higher for batches and channels (e.g. NÃ—CÃ—HÃ—W images). Frameworks store activations, weights, and gradients as tensors on CPU or GPU.

11 What are batch size and an epoch? Easy

Answer: An epoch is one full pass over the training dataset. Batch size is how many examples you use per forward/backward step before updating weights. Smaller batches add noise but fit in memory; larger batches give smoother gradients but need more VRAM.

Interview tip: Mention trade-offs: generalization vs stability, and that â€œiteration/stepâ€ â‰ epoch.

12 Explain overfitting in one paragraph. Easy

Answer: Overfitting is when the model memorizes training noise and fails on new data: low training error, high validation/test error. Mitigations include more data, regularization (L2, dropout), simpler architecture, early stopping, and better validation discipline.

13 What is the role of the output layer? Medium

Answer: The output layer produces task-specific predictions: linear units for regression, sigmoid for binary probability, softmax for multi-class probabilities. Its size matches the number of outputs (classes or targets); the loss is chosen to match that head.

14 Dense (fully connected) layers vs local connectivityâ€”intuition? Medium

Answer: Dense layers connect every input to every output unitâ€”flexible but parameter-heavy. Local connectivity (as in CNNs) ties each unit to a small spatial neighborhood, sharing weights across positionsâ€”far fewer parameters and inductive bias for grids like images.

15 State the universal approximation theorem in interview form. Hard

Answer: A feedforward network with at least one hidden layer with a non-linear activation and sufficient width can approximate a broad class of continuous functions on compact domainsâ€”given enough units. It is an existence result (not a guarantee of easy training or finite data).

Wide enough hidden layer + non-linearity â†’ dense in continuous functions (under standard assumptions)

Quick review checklist

Define NN, neuron, layer, and forward pass in your own words.
Contrast parameters vs hyperparameters with two examples each.
Explain why non-linearities matter and how training differs from inference.
Name output heads (regression / sigmoid / softmax) and tie each to a loss.
Give a one-sentence universal approximation statement without overstating it.

Next: Perceptron

Related Neural Networks Links

What Are Neural Networks? â€” 15 Interview Questions

Quick review checklist