Related Neural Networks Links
Learn Basics Neural Networks Tutorial, validate concepts with Basics Neural Networks MCQ Questions, and prepare interviews through Basics Neural Networks Interview Questions and Answers.
Neural Networks
15 Essential Q&A
Interview Prep
What Are Neural Networks? — 15 Interview Questions
Fifteen interview-style questions on definitions, architecture, training, and theory—each card uses a colored border so you can scan topics quickly before a screen or onsite.
Border colors group visual rhythm only; difficulty is shown on each question.
Architecture
Forward pass
Parameters
Training
1
What is a neural network in machine learning?
Easy
Answer: A neural network is a parametric model: layers of connected units (neurons) that transform inputs into outputs. Weights and biases are tuned on data (usually with gradients) so the network approximates a target mapping—for classification, regression, or sequences.
2
What is a neuron? What is a layer?
Easy
Answer: A neuron typically computes a weighted sum of inputs plus a bias, then applies an activation. A layer is a group of neurons at the same depth; input, hidden, and output layers stack to form the network.
z = w·x + b → a = activation(z)
3
What are parameters vs hyperparameters?
Easy
Answer: Parameters (weights, biases) are learned during training. Hyperparameters are set before or outside the main gradient loop: learning rate, batch size, number of layers/units, dropout rate, epochs, choice of optimizer, etc.
4
Explain forward propagation in one sentence, then expand.
Medium
Answer: Forward propagation passes input through each layer to produce a prediction and (typically) a loss. Expand: each layer applies affine transforms and activations; the final head matches the task (e.g. softmax logits for multi-class). No weight updates happen during pure inference forward passes.
5
Why do we need non-linear activation functions?
Easy
Answer: Composition of linear maps is still linear. Non-linear activations let deep stacks express rich functions; this underpins universal approximation for sufficiently wide/deep networks with non-linearities.
6
What is the difference between training and inference?
Easy
Answer: Training uses data to compute loss, backpropagate gradients, and update weights (may include dropout, augmentation). Inference uses fixed weights for prediction only—often with eval mode for layers like batch norm and no gradient computation.
7
How does a neural network differ from classical ML models?
Medium
Answer: Classical models (linear/logistic regression, shallow trees, SVMs with fixed kernels) often use hand-crafted features and limited composition. Deep NNs learn hierarchical representations from raw or weakly processed inputs; capacity scales with depth/width and data, at higher compute cost and risk of overfitting.
8
What roles do the loss function and optimizer play?
Medium
Answer: The loss scores how wrong predictions are (e.g. cross-entropy for classification). The optimizer uses gradients of that loss to update parameters (SGD, Adam, etc.). Together they define what “better†means and how the network moves toward it.
9
What is “deep learning†vs a shallow neural network?
Easy
Answer: Shallow usually means few hidden layers (or none). Deep learning refers to models with many stacked layers that learn hierarchical features. Depth increases expressive power and data/compute needs; it is not a precise numeric cutoff in interviews—emphasize “multiple hidden layers.â€
10
What is a tensor in neural network code?
Easy
Answer: A tensor is a multi-dimensional array holding numerical data—scalars (0D), vectors (1D), matrices (2D), or higher for batches and channels (e.g.
N×C×H×W images). Frameworks store activations, weights, and gradients as tensors on CPU or GPU.
11
What are batch size and an epoch?
Easy
Answer: An epoch is one full pass over the training dataset. Batch size is how many examples you use per forward/backward step before updating weights. Smaller batches add noise but fit in memory; larger batches give smoother gradients but need more VRAM.
Interview tip: Mention trade-offs: generalization vs stability, and that “iteration/step†≠epoch.
12
Explain overfitting in one paragraph.
Easy
Answer: Overfitting is when the model memorizes training noise and fails on new data: low training error, high validation/test error. Mitigations include more data, regularization (L2, dropout), simpler architecture, early stopping, and better validation discipline.
13
What is the role of the output layer?
Medium
Answer: The output layer produces task-specific predictions: linear units for regression, sigmoid for binary probability, softmax for multi-class probabilities. Its size matches the number of outputs (classes or targets); the loss is chosen to match that head.
14
Dense (fully connected) layers vs local connectivity—intuition?
Medium
Answer: Dense layers connect every input to every output unit—flexible but parameter-heavy. Local connectivity (as in CNNs) ties each unit to a small spatial neighborhood, sharing weights across positions—far fewer parameters and inductive bias for grids like images.
15
State the universal approximation theorem in interview form.
Hard
Answer: A feedforward network with at least one hidden layer with a non-linear activation and sufficient width can approximate a broad class of continuous functions on compact domains—given enough units. It is an existence result (not a guarantee of easy training or finite data).
Wide enough hidden layer + non-linearity → dense in continuous functions (under standard assumptions)
Quick review checklist
- Define NN, neuron, layer, and forward pass in your own words.
- Contrast parameters vs hyperparameters with two examples each.
- Explain why non-linearities matter and how training differs from inference.
- Name output heads (regression / sigmoid / softmax) and tie each to a loss.
- Give a one-sentence universal approximation statement without overstating it.