DBSCAN Q&A
20 Core Questions
Interview Prep
DBSCAN Clustering: Interview Q&A
Short questions and answers on DBSCAN: density-based clusters, eps, min_samples, core points and noise handling.
Density
Core Points
Eps
Noise
1
What is the main idea behind DBSCAN?
β‘ Beginner
Answer: DBSCAN groups together points that are closely packed (high density) and marks points in low-density regions as noise or outliers.
2
What two key parameters does DBSCAN use?
β‘ Beginner
Answer: DBSCAN uses eps (neighborhood radius) and min_samples (minimum points to form dense region).
3
What is a core point in DBSCAN terminology?
π Intermediate
Answer: A core point has at least min_samples points within eps distance (including itself).
4
What is a border point?
π Intermediate
Answer: A border point has fewer than min_samples neighbors in its eps neighborhood but lies within the eps neighborhood of a core point.
5
How does DBSCAN treat noise points?
β‘ Beginner
Answer: Points that are neither core nor border points are labeled as noise (outliers) and not assigned to any cluster.
6
Why is DBSCAN good for clusters of arbitrary shape?
π₯ Advanced
Answer: Because clusters are defined by density connectivity, DBSCAN can discover non-spherical, irregularly shaped clusters.
7
Does DBSCAN require the number of clusters to be specified in advance?
β‘ Beginner
Answer: No, the number of clusters emerges automatically based on eps and min_samples.
8
How does DBSCAN handle noise compared to k-means?
π Intermediate
Answer: DBSCAN explicitly labels noise points, while k-means always forces every point into some cluster.
9
Why can DBSCAN struggle with varying-density clusters?
π₯ Advanced
Answer: A single global eps and min_samples may be too strict for sparse clusters and too loose for dense ones, making parameter choice hard.
10
How can you choose eps in practice?
π₯ Advanced
Answer: A common heuristic is the k-distance plot (e.g., 4βNN distances) and looking for a βkneeβ to pick eps.
11
Does DBSCAN scale well to very high dimensions?
π₯ Advanced
Answer: Like other distance-based methods, it can suffer from the curse of dimensionality, making distance less meaningful.
12
Is DBSCAN deterministic?
β‘ Beginner
Answer: DBSCAN is typically deterministic given fixed parameters and distance metric, unlike k-means with random initialization.
13
When is DBSCAN a good choice compared to k-means?
π Intermediate
Answer: When clusters have arbitrary shapes, varying sizes, and you care about detecting noise/outliers without preβchoosing k.
14
How does min_samples relate to dimensionality?
π₯ Advanced
Answer: A rule of thumb is to set min_samples to be at least the dimensionality plus one, but tuning is often dataset-specific.
15
Can DBSCAN work with any distance metric?
π Intermediate
Answer: Yes, as long as the distance metric satisfies metric properties; implementations often support custom metrics.
16
How do you evaluate DBSCAN cluster quality without labels?
π Intermediate
Answer: Use internal indices such as silhouette score, and visually inspect 2D projections or domain-specific patterns.
17
What is a typical complexity bottleneck in DBSCAN?
π₯ Advanced
Answer: Finding neighbors within eps for all points can be costly; using spatial indexes helps speed it up.
18
Give a practical use case for DBSCAN.
β‘ Beginner
Answer: DBSCAN is used for geospatial clustering, anomaly detection and discovering spatial hotspots in data.
19
How does DBSCAN behave with very small eps?
π Intermediate
Answer: With very small eps, most points do not have enough neighbors to become core, so many points are labeled as noise.
20
What is the key message to remember about DBSCAN?
β‘ Beginner
Answer: DBSCAN is a powerful density-based clustering method that can find arbitrary shapes and isolate noise, but it requires thoughtful tuning of eps and min_samples.
Quick Recap: DBSCAN
If you understand core vs border vs noise points, and how eps and min_samples define density, you can confidently discuss DBSCANβs strengths and trade-offs.