Mixed Machine Learning Concepts: Q&A (Set 1)

Short mixed-topic questions across the ML workflow: data, modeling, evaluation and deployment.

Data Models Metrics Deployment

1 What is the difference between training, validation and test sets? ⚡ Beginner

Answer: Training is used to fit parameters, validation to tune hyperparameters and compare models, and test to estimate final performance.

2 What is regularization and why is it important? ⚡ Beginner

Answer: Regularization adds a penalty on model complexity (e.g., large weights) to reduce overfitting and improve generalization.

3 What is cross-validation and when should you use it? 📊 Intermediate

Answer: Cross-validation splits data into multiple train/validation folds to get more robust performance estimates, especially with limited data.

4 How do precision and recall relate to business trade-offs? 📊 Intermediate

Answer: High precision means few false positives; high recall means few false negatives. Which you prefer depends on which error is more costly.

5 What is the ROC curve and AUC in simple terms? 📊 Intermediate

Answer: ROC plots TPR vs FPR over thresholds; AUC summarizes this curve as a single score for ranking quality.

6 What is feature leakage (target leakage) and why is it dangerous? 🔥 Advanced

Answer: Leakage happens when features contain information not available at prediction time, causing unrealistically good metrics that fail in production.

7 When would you prefer a simple linear model over a complex non-linear model? 📊 Intermediate

Answer: When you need interpretability, robustness, fast training, or data is limited and relationship is roughly linear.

8 What is early stopping and how does it help? 📊 Intermediate

Answer: Early stopping stops training when validation performance stops improving, preventing overfitting in iterative models.

9 Why is scaling features important for some algorithms but not others? 📊 Intermediate

Answer: Distance- and gradient-based algorithms (e.g., k-NN, SVM, logistic regression) are sensitive to scale; tree-based models are mostly scale-invariant.

10 What is ensemble learning and why does it work? 🔥 Advanced

Answer: Ensembles combine multiple models to reduce variance, bias or both; diverse models’ errors tend to cancel out.

11 What is the key difference between bagging and boosting? 🔥 Advanced

Answer: Bagging trains models independently on resampled data (variance reduction); boosting trains models sequentially focusing on errors (bias reduction).

12 What is calibration of predicted probabilities and why is it important? 🔥 Advanced

Answer: Calibration means predicted probabilities match observed frequencies; it’s crucial when decisions depend on absolute risk levels.

13 What does “data drift” mean in deployed ML systems? 📊 Intermediate

Answer: Data drift occurs when the input distribution changes over time, which can degrade model performance.

14 Name three things you would monitor for a production ML model. 📊 Intermediate

Answer: Examples: input data distribution, prediction distributions, business KPIs, performance vs ground truth when available.

15 What is feature engineering and why is it powerful? ⚡ Beginner

Answer: Feature engineering creates informative inputs from raw data, often impacting performance more than model choice.

16 When would you use stratified sampling for train/test split? ⚡ Beginner

Answer: When you have class imbalance and want train/test sets to preserve class proportions.

17 What’s the difference between parameter tuning and feature selection? 🔥 Advanced

Answer: Parameter tuning adjusts model hyperparameters; feature selection chooses a subset of input features to use.

18 How would you explain “overfitting” to a non-technical stakeholder? ⚡ Beginner

Answer: The model is memorizing the training data’s noise instead of learning patterns that generalize to new cases.

19 Why is reproducibility important in ML experiments? ⚡ Beginner

Answer: Reproducibility lets you trust, debug and compare results; without it, improvements may just be random luck.

20 What is the key message to remember from this mixed Q&A set? ⚡ Beginner

Answer: Successful ML is not just about models—it’s about good data, sound evaluation, thoughtful deployment and monitoring.

Quick Recap: Mixed ML Concepts 1

Keeping a holistic view of the ML lifecycle—from data to metrics to production—is what separates strong practitioners from model-only specialists.

Back: NumPy Q&A Next: Mixed ML Q&A - Set 2