Principal Component Analysis (PCA): Interview Q&A

Short questions and answers on PCA: dimensionality reduction, components, explained variance and typical use cases.

Dimensionality Eigenvectors Explained Variance Projections

1 What problem does PCA solve? ⚡ Beginner

Answer: PCA reduces the dimensionality of data while retaining as much variance (information) as possible.

2 What are principal components? ⚡ Beginner

Answer: Principal components are new orthogonal axes (directions) in feature space along which the data has maximum variance.

3 Why should features be standardized before applying PCA? 📊 Intermediate

Answer: Without scaling, PCA would be dominated by features with larger numeric ranges, since variance is scale-dependent.

4 How is PCA related to eigenvalues and eigenvectors? 🔥 Advanced

Answer: PCA computes the eigenvectors and eigenvalues of the covariance matrix; eigenvectors are component directions, eigenvalues measure captured variance.

5 What is explained variance ratio in PCA? 📊 Intermediate

Answer: It is the fraction of total variance captured by each principal component, used to decide how many components to keep.

6 Is PCA supervised or unsupervised? ⚡ Beginner

Answer: PCA is an unsupervised technique; it ignores labels and focuses only on the feature covariance structure.

7 Can PCA improve model performance? 📊 Intermediate

Answer: Sometimes—by reducing noise, multicollinearity and overfitting, it can help some models, but it may also remove useful information.

8 How do you decide how many principal components to keep? 📊 Intermediate

Answer: Common methods: keep enough components to explain a target proportion of variance (e.g., 95%) or inspect the scree plot for an elbow.

9 Is PCA good for interpretability of original features? 📊 Intermediate

Answer: Not usually—components are linear combinations of all features, which can be hard to interpret directly.

10 How is PCA useful for visualization? ⚡ Beginner

Answer: PCA can project high-dimensional data down to 2D or 3D, making cluster and structure visualization easier.

11 Does PCA assume linear relationships? 🔥 Advanced

Answer: Yes, PCA captures linear correlations; it may miss complex non-linear structure without extensions like kernel PCA.

12 How does PCA relate to the covariance matrix? 🔥 Advanced

Answer: PCA finds directions (eigenvectors) that diagonalize the covariance matrix, concentrating variance along principal axes.

13 When might PCA hurt model performance? 📊 Intermediate

Answer: When important predictive information lies in low-variance directions or when interpretability of original features is critical.

14 Should PCA be fit on training data only or on the full dataset? 📊 Intermediate

Answer: Fit PCA on the training data only to avoid information leakage, then apply the learned transform to validation/test data.

15 Is PCA sensitive to outliers? 🔥 Advanced

Answer: Yes, outliers can significantly affect the covariance matrix and distort components; robust PCA variants exist.

16 Can PCA be used before clustering? 📊 Intermediate

Answer: Yes, PCA is often applied to reduce dimensionality and noise before clustering algorithms like k-means.

17 How does kernel PCA differ from standard PCA? 🔥 Advanced

Answer: Kernel PCA uses the kernel trick to perform PCA in a high-dimensional feature space, capturing non-linear structure.

18 Give a real-world use case where PCA is helpful. ⚡ Beginner

Answer: PCA is used for image compression, noise reduction, exploratory analysis and visualizing high-dimensional datasets.

19 Does PCA keep class separability in supervised problems? 🔥 Advanced

Answer: Not necessarily; PCA optimizes for variance, not class separation, so supervised methods like LDA might be better for that goal.

20 What is the key message to remember about PCA? ⚡ Beginner

Answer: PCA is a powerful linear tool for compression and visualization; always scale features, avoid leakage, and check that the lost dimensions aren’t crucial for your task.

Quick Recap: PCA

Understand variance, eigenvectors and projections—once you do, PCA becomes an intuitive way to simplify complex datasets before modeling or visualization.

Back: DBSCAN Q&A Next: t-SNE Q&A