Train/Test

Train-Test Split Interview Q&A

1Why split data into train and test?
Answer: To estimate model generalization on unseen data.
2What is validation set?
Answer: Dataset for model tuning between train and final test evaluation.
3Typical split ratios?
Answer: Commonly 80/20 or 70/15/15 depending on data size.
4What is stratified split?
Answer: Preserves target class distribution across train/test sets.
5Why random seed matters?
Answer: Ensures reproducibility of data partitions and results.
6What is data leakage in splitting?
Answer: Information from test set influencing training decisions.
7When use time-based split?
Answer: For temporal data to respect chronology and avoid look-ahead bias.
8What is k-fold cross validation?
Answer: Repeated train/validation across folds for reliable performance estimate.
9Can test set be used for tuning?
Answer: No, test set should be reserved for final unbiased evaluation.
10How handle imbalanced data during split?
Answer: Use stratification and evaluate with proper metrics (F1/PR-AUC).
11What if dataset is very small?
Answer: Prefer cross-validation and simpler models to reduce variance.
12One-line train/test summary?
Answer: Proper splitting is essential for trustworthy model evaluation.