Related Neural Networks Links
Learn Perceptron Neural Networks Tutorial, validate concepts with Perceptron Neural Networks MCQ Questions, and prepare interviews through Perceptron Neural Networks Interview Questions and Answers.
Neural Networks
15 Essential Q&A
Interview Prep
Perceptron — 15 Interview Questions
Linear classifiers, threshold decisions, the classic learning rule, separability, XOR, and convergence—same card layout and colored borders as the NN basics interview page.
Each question card uses a distinct left border color; difficulty badges are color-coded.
Linear boundary
Learning rule
XOR / limits
Convergence
1
What is the Rosenblatt perceptron?
Easy
Answer: An early binary linear classifier: it forms a weighted sum of inputs plus a bias, then applies a threshold (step) to decide between two classes. It is historically important as a simple trainable “neuron†and the starting point for multi-layer networks.
2
Write the perceptron decision rule with labels in {−1, +1}.
Easy
Answer: Compute the pre-activation (margin)
s = w·x + b. Predict ŷ = sign(s) (with a convention for s = 0, e.g. treat as +1 or define a tie rule). Training adjusts w, b only when ŷ ≠y.
ŷ = sign(w·x + b) with y ∈ {−1, +1}
3
What does “linearly separable†mean?
Easy
Answer: Two classes are linearly separable if there exists a hyperplane
w·x + b = 0 that puts all examples of one class strictly on one side and the other class on the other. The perceptron can learn such a separator when data are separable.
4
Why can a single perceptron not represent XOR?
Medium
Answer: XOR in 2D is not linearly separable: no single line separates (0,0)/(1,1) from (0,1)/(1,0). A perceptron is exactly one linear decision boundary, so it cannot fit XOR without adding features or hidden layers (e.g. MLP).
Interview tip: Mention Minsky/Papert context briefly—motivates multi-layer networks.
5
State the perceptron learning rule for misclassified points.
Medium
Answer: For labels
y ∈ {−1, +1}, when (x, y) is misclassified, update w ↠w + η y x and b ↠b + η y (learning rate η > 0; often η = 1 in the classic algorithm). Correct points receive no update.
w := w + η y x , b := b + η y (on mistake only)
6
When does the perceptron algorithm converge?
Hard
Answer: If the data are linearly separable, the perceptron rule converges in a finite number of mistakes (Novikoff-style bounds). If data are not separable, updates can cycle indefinitely—need pocket algorithm, averages, or a different model/loss.
7
Why is the bias term important?
Easy
Answer: Without b, every separating hyperplane must pass through the origin in feature space. The bias shifts the decision boundary so it can separate offset clouds of points. Often implemented as an extra input fixed at 1 with a weight
wâ‚€.
8
Step activation vs sigmoid for a “perceptronâ€â€”what changes?
Medium
Answer: The step gives a hard decision and zero gradient almost everywhere—classic perceptron uses discrete updates, not backprop through the step. Sigmoid is smooth, yields probabilities, and supports gradient-based training (logistic regression / neural nets with continuous loss).
9
How does a perceptron relate to logistic regression?
Medium
Answer: Both use a linear score
w·x + b. Logistic regression outputs sigmoid(score) as probability and minimizes log loss with gradients. The perceptron uses a hard threshold and mistake-driven updates; same geometry, different output and training objective.
10
What is the margin of a correctly classified point?
Medium
Answer: With
y ∈ {−1, +1}, the (signed) margin is often written y (w·x + b). It is positive when the prediction is correct; larger values mean the point is farther from the decision boundary. Convergence proofs bound the number of mistakes using margin and norm of a separating vector.
11
Does feature scaling matter for the perceptron?
Easy
Answer: The decision boundary is still linear, but scale differences across features can slow convergence or make updates dominated by large-magnitude inputs. Standardizing or scaling features often helps iterative algorithms behave more evenly in practice.
12
How can perceptrons be used for multi-class problems?
Medium
Answer: Common reductions: one-vs-rest (one perceptron per class vs all others) or one-vs-one (pairwise classifiers). At prediction time, combine votes or scores. This is not softmax—mention softmax as the smooth multi-class alternative in neural nets.
13
What is the “pocket†algorithm idea?
Hard
Answer: On noisy or non-separable data, standard perceptron updates may not stabilize. The pocket variant keeps the weight vector that achieved the lowest training error so far (“in your pocketâ€) while continuing updates, returning the best snapshot instead of the last iterate.
14
Perceptron vs linear SVM—one-minute comparison.
Hard
Answer: Both learn linear separators. SVM maximizes margin (often with slack for soft-margin) and yields a unique solution under convex optimization. Perceptron finds any separating hyperplane if one exists; many solutions possible. SVM generalizes better with kernels; perceptron is simple and historically foundational.
15
How does stacking perceptrons lead to multi-layer networks?
Medium
Answer: One perceptron = one linear boundary. Hidden layers of non-linear units compose boundaries: early layers can fold or combine half-spaces so later layers separate XOR-like patterns. That is the core idea of an MLP: depth + non-linearity overcomes single-layer limits.
MLP: h = σ(Wâ‚x + bâ‚) → Å· = σ(Wâ‚‚h + bâ‚‚) (non-linear σ)
Quick review checklist
- Write the decision rule and the mistake-driven update with
y ∈ {−1, +1}. - Explain linear separability and draw XOR as the canonical counterexample.
- State finite convergence for separable data; say what breaks on noisy data.
- Contrast step vs sigmoid and perceptron vs logistic regression.
- Close with how hidden layers fix what one perceptron cannot do.