PCA Q&A
20 Core Questions
Interview Prep
Principal Component Analysis (PCA): Interview Q&A
Short questions and answers on PCA: dimensionality reduction, components, explained variance and typical use cases.
Dimensionality
Eigenvectors
Explained Variance
Projections
1
What problem does PCA solve?
⚡ Beginner
Answer: PCA reduces the dimensionality of data while retaining as much variance (information) as possible.
2
What are principal components?
⚡ Beginner
Answer: Principal components are new orthogonal axes (directions) in feature space along which the data has maximum variance.
3
Why should features be standardized before applying PCA?
📊 Intermediate
Answer: Without scaling, PCA would be dominated by features with larger numeric ranges, since variance is scale-dependent.
4
How is PCA related to eigenvalues and eigenvectors?
🔥 Advanced
Answer: PCA computes the eigenvectors and eigenvalues of the covariance matrix; eigenvectors are component directions, eigenvalues measure captured variance.
5
What is explained variance ratio in PCA?
📊 Intermediate
Answer: It is the fraction of total variance captured by each principal component, used to decide how many components to keep.
6
Is PCA supervised or unsupervised?
⚡ Beginner
Answer: PCA is an unsupervised technique; it ignores labels and focuses only on the feature covariance structure.
7
Can PCA improve model performance?
📊 Intermediate
Answer: Sometimes—by reducing noise, multicollinearity and overfitting, it can help some models, but it may also remove useful information.
8
How do you decide how many principal components to keep?
📊 Intermediate
Answer: Common methods: keep enough components to explain a target proportion of variance (e.g., 95%) or inspect the scree plot for an elbow.
9
Is PCA good for interpretability of original features?
📊 Intermediate
Answer: Not usually—components are linear combinations of all features, which can be hard to interpret directly.
10
How is PCA useful for visualization?
⚡ Beginner
Answer: PCA can project high-dimensional data down to 2D or 3D, making cluster and structure visualization easier.
11
Does PCA assume linear relationships?
🔥 Advanced
Answer: Yes, PCA captures linear correlations; it may miss complex non-linear structure without extensions like kernel PCA.
12
How does PCA relate to the covariance matrix?
🔥 Advanced
Answer: PCA finds directions (eigenvectors) that diagonalize the covariance matrix, concentrating variance along principal axes.
13
When might PCA hurt model performance?
📊 Intermediate
Answer: When important predictive information lies in low-variance directions or when interpretability of original features is critical.
14
Should PCA be fit on training data only or on the full dataset?
📊 Intermediate
Answer: Fit PCA on the training data only to avoid information leakage, then apply the learned transform to validation/test data.
15
Is PCA sensitive to outliers?
🔥 Advanced
Answer: Yes, outliers can significantly affect the covariance matrix and distort components; robust PCA variants exist.
16
Can PCA be used before clustering?
📊 Intermediate
Answer: Yes, PCA is often applied to reduce dimensionality and noise before clustering algorithms like k-means.
17
How does kernel PCA differ from standard PCA?
🔥 Advanced
Answer: Kernel PCA uses the kernel trick to perform PCA in a high-dimensional feature space, capturing non-linear structure.
18
Give a real-world use case where PCA is helpful.
⚡ Beginner
Answer: PCA is used for image compression, noise reduction, exploratory analysis and visualizing high-dimensional datasets.
19
Does PCA keep class separability in supervised problems?
🔥 Advanced
Answer: Not necessarily; PCA optimizes for variance, not class separation, so supervised methods like LDA might be better for that goal.
20
What is the key message to remember about PCA?
⚡ Beginner
Answer: PCA is a powerful linear tool for compression and visualization; always scale features, avoid leakage, and check that the lost dimensions aren’t crucial for your task.
Quick Recap: PCA
Understand variance, eigenvectors and projections—once you do, PCA becomes an intuitive way to simplify complex datasets before modeling or visualization.