Mixed ML Q&A - Set 1 20 Core Questions
Interview Prep

Mixed Machine Learning Concepts: Q&A (Set 1)

Short mixed-topic questions across the ML workflow: data, modeling, evaluation and deployment.

Data Models Metrics Deployment
1 What is the difference between training, validation and test sets? ⚡ Beginner
Answer: Training is used to fit parameters, validation to tune hyperparameters and compare models, and test to estimate final performance.
2 What is regularization and why is it important? ⚡ Beginner
Answer: Regularization adds a penalty on model complexity (e.g., large weights) to reduce overfitting and improve generalization.
3 What is cross-validation and when should you use it? 📊 Intermediate
Answer: Cross-validation splits data into multiple train/validation folds to get more robust performance estimates, especially with limited data.
4 How do precision and recall relate to business trade-offs? 📊 Intermediate
Answer: High precision means few false positives; high recall means few false negatives. Which you prefer depends on which error is more costly.
5 What is the ROC curve and AUC in simple terms? 📊 Intermediate
Answer: ROC plots TPR vs FPR over thresholds; AUC summarizes this curve as a single score for ranking quality.
6 What is feature leakage (target leakage) and why is it dangerous? 🔥 Advanced
Answer: Leakage happens when features contain information not available at prediction time, causing unrealistically good metrics that fail in production.
7 When would you prefer a simple linear model over a complex non-linear model? 📊 Intermediate
Answer: When you need interpretability, robustness, fast training, or data is limited and relationship is roughly linear.
8 What is early stopping and how does it help? 📊 Intermediate
Answer: Early stopping stops training when validation performance stops improving, preventing overfitting in iterative models.
9 Why is scaling features important for some algorithms but not others? 📊 Intermediate
Answer: Distance- and gradient-based algorithms (e.g., k-NN, SVM, logistic regression) are sensitive to scale; tree-based models are mostly scale-invariant.
10 What is ensemble learning and why does it work? 🔥 Advanced
Answer: Ensembles combine multiple models to reduce variance, bias or both; diverse models’ errors tend to cancel out.
11 What is the key difference between bagging and boosting? 🔥 Advanced
Answer: Bagging trains models independently on resampled data (variance reduction); boosting trains models sequentially focusing on errors (bias reduction).
12 What is calibration of predicted probabilities and why is it important? 🔥 Advanced
Answer: Calibration means predicted probabilities match observed frequencies; it’s crucial when decisions depend on absolute risk levels.
13 What does “data drift” mean in deployed ML systems? 📊 Intermediate
Answer: Data drift occurs when the input distribution changes over time, which can degrade model performance.
14 Name three things you would monitor for a production ML model. 📊 Intermediate
Answer: Examples: input data distribution, prediction distributions, business KPIs, performance vs ground truth when available.
15 What is feature engineering and why is it powerful? ⚡ Beginner
Answer: Feature engineering creates informative inputs from raw data, often impacting performance more than model choice.
16 When would you use stratified sampling for train/test split? ⚡ Beginner
Answer: When you have class imbalance and want train/test sets to preserve class proportions.
17 What’s the difference between parameter tuning and feature selection? 🔥 Advanced
Answer: Parameter tuning adjusts model hyperparameters; feature selection chooses a subset of input features to use.
18 How would you explain “overfitting” to a non-technical stakeholder? ⚡ Beginner
Answer: The model is memorizing the training data’s noise instead of learning patterns that generalize to new cases.
19 Why is reproducibility important in ML experiments? ⚡ Beginner
Answer: Reproducibility lets you trust, debug and compare results; without it, improvements may just be random luck.
20 What is the key message to remember from this mixed Q&A set? ⚡ Beginner
Answer: Successful ML is not just about models—it’s about good data, sound evaluation, thoughtful deployment and monitoring.

Quick Recap: Mixed ML Concepts 1

Keeping a holistic view of the ML lifecycle—from data to metrics to production—is what separates strong practitioners from model-only specialists.