Machine Learning Fundamentals24 Q&A
Machine Learning Fundamentals — Q&A
ML concepts, train/test splits, bias-variance, feature engineering, and dimensionality reduction.
Machine Learning Concepts Q&A
1What is Machine Learning?
Answer: Learning patterns from data to make predictions/decisions without explicit rules.
2Supervised vs unsupervised learning?
Answer: Supervised uses labeled data; unsupervised finds structure in unlabeled data.
3Classification vs regression?
Answer: Classification predicts categories; regression predicts continuous values.
4What is overfitting?
Answer: Model memorizes training noise and fails to generalize.
5What is underfitting?
Answer: Model is too simple to learn meaningful patterns.
6Bias-variance tradeoff?
Answer: Balance model simplicity and flexibility for best generalization.
7What is regularization?
Answer: Penalty terms that reduce overfitting by constraining complexity.
8Why feature scaling?
Answer: Helps distance/gradient-based models converge and behave consistently.
9What is cross-validation?
Answer: Repeated train/validation splits for robust performance estimation.
10What are confusion matrix metrics?
Answer: Precision, recall, F1, and specificity from TP/FP/TN/FN counts.
11What is ROC-AUC?
Answer: Threshold-independent measure of classification ranking quality.
12One-line ML concept summary?
Answer: ML is about building models that generalize reliably to unseen data.
Train-Test Split Interview Q&A
13Why split data into train and test?
Answer: To estimate model generalization on unseen data.
14What is validation set?
Answer: Dataset for model tuning between train and final test evaluation.
15Typical split ratios?
Answer: Commonly 80/20 or 70/15/15 depending on data size.
16What is stratified split?
Answer: Preserves target class distribution across train/test sets.
17Why random seed matters?
Answer: Ensures reproducibility of data partitions and results.
18What is data leakage in splitting?
Answer: Information from test set influencing training decisions.
19When use time-based split?
Answer: For temporal data to respect chronology and avoid look-ahead bias.
20What is k-fold cross validation?
Answer: Repeated train/validation across folds for reliable performance estimate.
21Can test set be used for tuning?
Answer: No, test set should be reserved for final unbiased evaluation.
22How handle imbalanced data during split?
Answer: Use stratification and evaluate with proper metrics (F1/PR-AUC).
23What if dataset is very small?
Answer: Prefer cross-validation and simpler models to reduce variance.
24One-line train/test summary?
Answer: Proper splitting is essential for trustworthy model evaluation.