Machine Learning Fundamentals24 Q&A

Machine Learning Fundamentals — Q&A

ML concepts, train/test splits, bias-variance, feature engineering, and dimensionality reduction.

Machine Learning Concepts Q&A

1What is Machine Learning?

Answer: Learning patterns from data to make predictions/decisions without explicit rules.

2Supervised vs unsupervised learning?

Answer: Supervised uses labeled data; unsupervised finds structure in unlabeled data.

3Classification vs regression?

Answer: Classification predicts categories; regression predicts continuous values.

4What is overfitting?

Answer: Model memorizes training noise and fails to generalize.

5What is underfitting?

Answer: Model is too simple to learn meaningful patterns.

6Bias-variance tradeoff?

Answer: Balance model simplicity and flexibility for best generalization.

7What is regularization?

Answer: Penalty terms that reduce overfitting by constraining complexity.

8Why feature scaling?

Answer: Helps distance/gradient-based models converge and behave consistently.

9What is cross-validation?

Answer: Repeated train/validation splits for robust performance estimation.

10What are confusion matrix metrics?

Answer: Precision, recall, F1, and specificity from TP/FP/TN/FN counts.

11What is ROC-AUC?

Answer: Threshold-independent measure of classification ranking quality.

12One-line ML concept summary?

Answer: ML is about building models that generalize reliably to unseen data.

Train-Test Split Interview Q&A

13Why split data into train and test?

Answer: To estimate model generalization on unseen data.

14What is validation set?

Answer: Dataset for model tuning between train and final test evaluation.

15Typical split ratios?

Answer: Commonly 80/20 or 70/15/15 depending on data size.

16What is stratified split?

Answer: Preserves target class distribution across train/test sets.

17Why random seed matters?

Answer: Ensures reproducibility of data partitions and results.

18What is data leakage in splitting?

Answer: Information from test set influencing training decisions.

19When use time-based split?

Answer: For temporal data to respect chronology and avoid look-ahead bias.

20What is k-fold cross validation?

Answer: Repeated train/validation across folds for reliable performance estimate.

21Can test set be used for tuning?

Answer: No, test set should be reserved for final unbiased evaluation.

22How handle imbalanced data during split?

Answer: Use stratification and evaluate with proper metrics (F1/PR-AUC).

23What if dataset is very small?

Answer: Prefer cross-validation and simpler models to reduce variance.

24One-line train/test summary?

Answer: Proper splitting is essential for trustworthy model evaluation.

Previous Next