Machine Learning Basics: Interview Q&A

Question 1

1 What is machine learning and how is it different from traditional programming? ⚡ Beginner

Answer

Answer: In traditional programming we explicitly write rules that map inputs to outputs. In machine learning we provide data and (often) labels, and the algorithm learns the mapping or rules automatically by optimizing an objective function. The resulting model can generalize to new, unseen data instead of handling only cases we coded by hand.

Question 2

2 What are the main types of machine learning? ⚡ Beginner

Answer

Answer: The three core types are:

Supervised learning: Learn from labeled data (inputs + known outputs). Used for regression and classification.
Unsupervised learning: Learn structure from unlabeled data. Used for clustering, dimensionality reduction, pattern discovery.
Reinforcement learning: An agent interacts with an environment, receiving rewards or penalties and learning a policy to maximize long‑term reward.

Question 3

3 What is the bias–variance tradeoff? 📊 Intermediate

Answer

Answer: The bias–variance tradeoff describes how model complexity affects two sources of error:

Bias: Error from overly simple assumptions (e.g., assuming a linear relationship). High bias → underfitting.
Variance: Error from being too sensitive to training data noise. High variance → overfitting.

Increasing complexity usually decreases bias but increases variance, so we choose a model in the “sweet spot” that balances both.

Question 4

4 What is overfitting and how can you detect it? ⚡ Beginner

Answer

Answer: Overfitting happens when a model learns patterns specific to the training data (including noise) that do not generalize to new data. You typically detect it by:

Very low training error but significantly higher validation/test error.
Model performance that improves on training set as complexity grows, but starts to degrade on validation set.

Question 5

5 How can you reduce overfitting in machine learning models? 📊 Intermediate

Answer

Answer: Common techniques to reduce overfitting are:

Collect more data (if possible).
Regularization (L1, L2, dropout for neural networks).
Reduce model complexity (depth limits, fewer features, pruning trees).
Cross-validation to tune hyperparameters robustly.
Early stopping when validation loss stops improving.
Ensemble methods (bagging, boosting) to smooth predictions.

Question 6

6 What is cross-validation and why is it used? ⚡ Beginner

Answer

Answer: Cross-validation is a resampling technique used to estimate a model’s generalization performance. In k‑fold cross‑validation, data is split into k folds; each fold is used once as validation while the remaining k−1 folds form the training set. It:

Provides a more stable performance estimate than a single train/test split.
Helps detect overfitting and compare models/hyperparameters.

Question 7

7 What is the difference between regression and classification? ⚡ Beginner

Answer

Answer:

Regression: Predicts a continuous numeric value (e.g., price, temperature, demand).
Classification: Predicts a discrete class label (e.g., spam/ham, fraud/not fraud, species type).

Many algorithms can be configured for either task (e.g., trees, neural networks) by changing the loss and output layer.

Question 8

8 Which metrics are commonly used for classification vs regression? 📊 Intermediate

Answer

Answer:

Classification: Accuracy, precision, recall, F1‑score, ROC‑AUC, log‑loss.
Regression: MAE (mean absolute error), MSE (mean squared error), RMSE, R² and adjusted R².

The best metric depends on the problem (e.g., imbalanced classes → precision/recall/F1 or ROC‑AUC rather than accuracy).

Question 9

9 What is feature engineering and why is it important? 📊 Intermediate

Answer

Answer: Feature engineering is the process of transforming raw data into informative input features for a model. It includes:

Handling missing values and outliers.
Encoding categorical variables.
Creating interaction terms or domain‑specific features.
Scaling/normalizing numeric features.

Good features often matter more than the choice of algorithm and can let simple models perform very well.

Question 10

10 Briefly describe a typical end‑to‑end machine learning workflow. 📊 Intermediate

Answer

Answer: A common ML workflow is:

Problem understanding: Define objective and success metrics.
Data collection: Gather and integrate relevant data sources.
EDA & cleaning: Explore, visualize, handle missing values and outliers.
Feature engineering: Encode, scale and create meaningful features.
Train/validation/test split: Keep a hold‑out set for final evaluation.
Model selection & tuning: Try baseline models, tune hyperparameters (e.g., via cross‑validation).
Evaluation: Compare models with appropriate metrics and business constraints.
Deployment & monitoring: Put the model in production, monitor performance and retrain when data drifts.

Question 11

11 What is a training set, validation set and test set? ⚡ Beginner

Answer

Answer: The training set is used to fit model parameters, the validation set is used to tune hyperparameters and compare models, and the test set is a final untouched set used once to estimate real‑world performance.

Question 12

12 Why do we normalize or standardize features? ⚡ Beginner

Answer

Answer: Scaling puts features on a similar range so that distance‑based algorithms and gradient‑based optimizers behave well and no single large‑scale feature dominates the learning process.

Question 13

13 What is a hyperparameter? ⚡ Beginner

Answer

Answer: A hyperparameter is a setting chosen before training (e.g., learning rate, tree depth, C in SVM) that controls how the algorithm learns but is not learned directly from the data.

Question 14

14 What is a baseline model and why is it useful? ⚡ Beginner

Answer

Answer: A baseline is a simple reference model (e.g., always predict mean, majority class, or a simple linear model) used to see whether more complex models truly add value.

Question 15

15 What is data leakage in machine learning? 📊 Intermediate

Answer

Answer: Data leakage occurs when information from outside the training data or from the future (e.g., test labels, post‑event data) inadvertently leaks into the training process, leading to overly optimistic metrics that will not hold in production.

Question 16

16 What is a confusion matrix in simple terms? ⚡ Beginner

Answer

Answer: A confusion matrix is a small table that counts how many examples of each true class were predicted as each possible class (true positives, false positives, true negatives, false negatives).

Question 17

17 What is the difference between batch, mini‑batch and online learning? 📊 Intermediate

Answer

Answer: Batch learning uses the whole dataset at once, mini‑batch learning updates on small batches of examples, and online (stochastic) learning updates the model for each single example as it arrives.

Question 18

18 What is gradient descent in one sentence? 📊 Intermediate

Answer

Answer: Gradient descent is an optimization algorithm that moves parameters in the opposite direction of the loss gradient to iteratively reduce the loss function.

Question 19

19 When would you prefer a simple model over a complex one? 📊 Intermediate

Answer

Answer: You often prefer a simple model when you have limited data, strong interpretability requirements, or when it already meets performance goals—simpler models are usually easier to debug and more robust.

Question 20

20 Why is it important to monitor models after deployment? 📊 Intermediate

Answer

Answer: Real‑world data and business conditions change over time; monitoring performance, inputs and errors lets you detect drift, bugs and fairness issues early and retrain or update the model when needed.

Machine Learning Basics: Interview Q&A

Quick Recap: ML Basics

You Should Be Able To:

Next Steps