Mixed ML Q&A - Set 1
20 Core Questions
Interview Prep
Mixed Machine Learning Concepts: Q&A (Set 1)
Short mixed-topic questions across the ML workflow: data, modeling, evaluation and deployment.
Data
Models
Metrics
Deployment
1
What is the difference between training, validation and test sets?
⚡ Beginner
Answer: Training is used to fit parameters, validation to tune hyperparameters and compare models, and test to estimate final performance.
2
What is regularization and why is it important?
⚡ Beginner
Answer: Regularization adds a penalty on model complexity (e.g., large weights) to reduce overfitting and improve generalization.
3
What is cross-validation and when should you use it?
📊 Intermediate
Answer: Cross-validation splits data into multiple train/validation folds to get more robust performance estimates, especially with limited data.
4
How do precision and recall relate to business trade-offs?
📊 Intermediate
Answer: High precision means few false positives; high recall means few false negatives. Which you prefer depends on which error is more costly.
5
What is the ROC curve and AUC in simple terms?
📊 Intermediate
Answer: ROC plots TPR vs FPR over thresholds; AUC summarizes this curve as a single score for ranking quality.
6
What is feature leakage (target leakage) and why is it dangerous?
🔥 Advanced
Answer: Leakage happens when features contain information not available at prediction time, causing unrealistically good metrics that fail in production.
7
When would you prefer a simple linear model over a complex non-linear model?
📊 Intermediate
Answer: When you need interpretability, robustness, fast training, or data is limited and relationship is roughly linear.
8
What is early stopping and how does it help?
📊 Intermediate
Answer: Early stopping stops training when validation performance stops improving, preventing overfitting in iterative models.
9
Why is scaling features important for some algorithms but not others?
📊 Intermediate
Answer: Distance- and gradient-based algorithms (e.g., k-NN, SVM, logistic regression) are sensitive to scale; tree-based models are mostly scale-invariant.
10
What is ensemble learning and why does it work?
🔥 Advanced
Answer: Ensembles combine multiple models to reduce variance, bias or both; diverse models’ errors tend to cancel out.
11
What is the key difference between bagging and boosting?
🔥 Advanced
Answer: Bagging trains models independently on resampled data (variance reduction); boosting trains models sequentially focusing on errors (bias reduction).
12
What is calibration of predicted probabilities and why is it important?
🔥 Advanced
Answer: Calibration means predicted probabilities match observed frequencies; it’s crucial when decisions depend on absolute risk levels.
13
What does “data drift” mean in deployed ML systems?
📊 Intermediate
Answer: Data drift occurs when the input distribution changes over time, which can degrade model performance.
14
Name three things you would monitor for a production ML model.
📊 Intermediate
Answer: Examples: input data distribution, prediction distributions, business KPIs, performance vs ground truth when available.
15
What is feature engineering and why is it powerful?
⚡ Beginner
Answer: Feature engineering creates informative inputs from raw data, often impacting performance more than model choice.
16
When would you use stratified sampling for train/test split?
⚡ Beginner
Answer: When you have class imbalance and want train/test sets to preserve class proportions.
17
What’s the difference between parameter tuning and feature selection?
🔥 Advanced
Answer: Parameter tuning adjusts model hyperparameters; feature selection chooses a subset of input features to use.
18
How would you explain “overfitting” to a non-technical stakeholder?
⚡ Beginner
Answer: The model is memorizing the training data’s noise instead of learning patterns that generalize to new cases.
19
Why is reproducibility important in ML experiments?
⚡ Beginner
Answer: Reproducibility lets you trust, debug and compare results; without it, improvements may just be random luck.
20
What is the key message to remember from this mixed Q&A set?
⚡ Beginner
Answer: Successful ML is not just about models—it’s about good data, sound evaluation, thoughtful deployment and monitoring.
Quick Recap: Mixed ML Concepts 1
Keeping a holistic view of the ML lifecycle—from data to metrics to production—is what separates strong practitioners from model-only specialists.