Bagging Q&A
20 Core Questions
Interview Prep
Bagging (Bootstrap Aggregating): Interview Q&A
Short questions and answers on bagging: bootstrap sampling, variance reduction and strong ensemble learners.
Bootstrap
Variance
Randomization
Random Forests
1
What is bagging in machine learning?
β‘ Beginner
Answer: Bagging (bootstrap aggregating) trains multiple models on bootstrapped samples of the training data and averages their predictions.
2
What is the main goal of bagging?
β‘ Beginner
Answer: The main goal is to reduce variance and improve stability of high-variance models like decision trees.
3
What is a bootstrap sample?
β‘ Beginner
Answer: A bootstrap sample is obtained by sampling with replacement from the original dataset, usually to the same size.
4
Why does averaging predictions reduce variance?
π Intermediate
Answer: If individual model errors are not perfectly correlated, averaging cancels some noise, lowering overall variance.
5
Is bagging more effective for high-bias or high-variance models?
π Intermediate
Answer: Bagging is most effective for high-variance, low-bias models, such as deep decision trees.
6
How do bagging and random forests relate?
π Intermediate
Answer: A random forest is essentially bagging of decision trees with additional feature-level randomness at each split.
7
How are predictions combined in bagging for regression and classification?
β‘ Beginner
Answer: For regression you average predictions; for classification you usually take a majority vote.
8
What is out-of-bag (OOB) evaluation in bagging?
π₯ Advanced
Answer: OOB evaluation uses samples not included in a modelβs bootstrap set as validation data, giving an internal estimate of performance.
9
Does bagging increase or decrease bias?
π Intermediate
Answer: Bagging typically keeps bias about the same and reduces variance; bias may increase slightly in some cases.
10
What types of base learners are commonly used in bagging?
β‘ Beginner
Answer: Decision trees are most common, but other unstable models can also benefit from bagging.
11
Does bagging help for linear models like logistic regression?
π₯ Advanced
Answer: Usually not much, because such models are low-variance, high-bias; variance reduction brings little gain.
12
How does bagging compare to boosting in terms of bias and variance?
π₯ Advanced
Answer: Bagging mainly reduces variance, while boosting aims to reduce bias by focusing on hard examples.
13
What is the main computational cost of bagging?
π Intermediate
Answer: Training many models increases training time and memory usage, though training can be parallelized.
14
How does the number of estimators affect a bagging ensemble?
π Intermediate
Answer: More estimators generally reduce variance and improve stability up to a point, but with diminishing returns and higher cost.
15
How does bagging interact with overfitting?
π₯ Advanced
Answer: Bagging allows you to overfit individual base learners (like deep trees) and then reduce overfitting via averaging.
16
What is an example of a pure bagging algorithm?
β‘ Beginner
Answer: The BaggingClassifier in scikit-learn wrapping decision trees is a classic example.
17
Does bagging require independent base learners?
π Intermediate
Answer: They need not be independent, but the more decorrelated they are, the more variance reduction you get from averaging.
18
How do you evaluate a bagging model efficiently?
π₯ Advanced
Answer: You can use OOB estimates instead of a separate validation set, plus standard cross-validation for confirmation.
19
Give a real-world use case where bagging is effective.
β‘ Beginner
Answer: Bagging decision trees (random forests) performs well on tabular business data like credit scoring, churn prediction and risk modeling.
20
What is the key message to remember about bagging?
β‘ Beginner
Answer: Bagging is a simple but powerful ensemble strategy for taming unstable models by trading extra computation for lower variance and better generalization.
Quick Recap: Bagging
Whenever you have a high-variance base learner and enough compute, consider bagging or random forestsβtheyβre reliable workhorses for many ML problems.