Q&A40 Questions

Dimensionality Reduction — Q&A

PCA, t-SNE, and techniques to reduce feature space for ML.

Principal Component Analysis (PCA): Interview Q&A

1 What problem does PCA solve? âš¡ Beginner
Answer: PCA reduces the dimensionality of data while retaining as much variance (information) as possible.
2 What are principal components? âš¡ Beginner
Answer: Principal components are new orthogonal axes (directions) in feature space along which the data has maximum variance.
3 Why should features be standardized before applying PCA? 📊 Intermediate
Answer: Without scaling, PCA would be dominated by features with larger numeric ranges, since variance is scale-dependent.
4 How is PCA related to eigenvalues and eigenvectors? 🔥 Advanced
Answer: PCA computes the eigenvectors and eigenvalues of the covariance matrix; eigenvectors are component directions, eigenvalues measure captured variance.
5 What is explained variance ratio in PCA? 📊 Intermediate
Answer: It is the fraction of total variance captured by each principal component, used to decide how many components to keep.
6 Is PCA supervised or unsupervised? âš¡ Beginner
Answer: PCA is an unsupervised technique; it ignores labels and focuses only on the feature covariance structure.
7 Can PCA improve model performance? 📊 Intermediate
Answer: Sometimes—by reducing noise, multicollinearity and overfitting, it can help some models, but it may also remove useful information.
8 How do you decide how many principal components to keep? 📊 Intermediate
Answer: Common methods: keep enough components to explain a target proportion of variance (e.g., 95%) or inspect the scree plot for an elbow.
9 Is PCA good for interpretability of original features? 📊 Intermediate
Answer: Not usually—components are linear combinations of all features, which can be hard to interpret directly.
10 How is PCA useful for visualization? âš¡ Beginner
Answer: PCA can project high-dimensional data down to 2D or 3D, making cluster and structure visualization easier.
11 Does PCA assume linear relationships? 🔥 Advanced
Answer: Yes, PCA captures linear correlations; it may miss complex non-linear structure without extensions like kernel PCA.
12 How does PCA relate to the covariance matrix? 🔥 Advanced
Answer: PCA finds directions (eigenvectors) that diagonalize the covariance matrix, concentrating variance along principal axes.
13 When might PCA hurt model performance? 📊 Intermediate
Answer: When important predictive information lies in low-variance directions or when interpretability of original features is critical.
14 Should PCA be fit on training data only or on the full dataset? 📊 Intermediate
Answer: Fit PCA on the training data only to avoid information leakage, then apply the learned transform to validation/test data.
15 Is PCA sensitive to outliers? 🔥 Advanced
Answer: Yes, outliers can significantly affect the covariance matrix and distort components; robust PCA variants exist.
16 Can PCA be used before clustering? 📊 Intermediate
Answer: Yes, PCA is often applied to reduce dimensionality and noise before clustering algorithms like k-means.
17 How does kernel PCA differ from standard PCA? 🔥 Advanced
Answer: Kernel PCA uses the kernel trick to perform PCA in a high-dimensional feature space, capturing non-linear structure.
18 Give a real-world use case where PCA is helpful. âš¡ Beginner
Answer: PCA is used for image compression, noise reduction, exploratory analysis and visualizing high-dimensional datasets.
19 Does PCA keep class separability in supervised problems? 🔥 Advanced
Answer: Not necessarily; PCA optimizes for variance, not class separation, so supervised methods like LDA might be better for that goal.
20 What is the key message to remember about PCA? âš¡ Beginner
Answer: PCA is a powerful linear tool for compression and visualization; always scale features, avoid leakage, and check that the lost dimensions aren’t crucial for your task.

t-SNE: Interview Q&A

21 What is t-SNE mainly used for? âš¡ Beginner
Answer: t-SNE is used for visualizing high-dimensional data in 2D or 3D while preserving local neighborhood structure.
22 Is t-SNE a linear or non-linear method? âš¡ Beginner
Answer: t-SNE is a non-linear dimensionality reduction technique.
23 What does t-SNE try to preserve when reducing dimensions? 📊 Intermediate
Answer: It aims to preserve local neighbor relationships by matching pairwise similarity distributions in high and low dimensions.
24 What is perplexity in t-SNE? 🔥 Advanced
Answer: Perplexity is a parameter roughly related to the effective number of neighbors considered for each point.
25 How does the learning rate affect t-SNE? 🔥 Advanced
Answer: Too small a learning rate leads to slow convergence; too large can cause points to crowd or diverge.
26 Why is t-SNE primarily an exploratory tool, not a general-purpose feature reducer? 📊 Intermediate
Answer: t-SNE is non-parametric, stochastic and focuses on visualization; it doesn’t provide a simple mapping for new points and distortions can be hard to interpret quantitatively.
27 Is the global structure in a t-SNE plot always reliable? 🔥 Advanced
Answer: Not necessarily; t-SNE is designed to preserve local structure, so global distances and cluster sizes can be misleading.
28 Should you run t-SNE on raw features or after a step like PCA? 📊 Intermediate
Answer: Often you first apply PCA to reduce dimensionality (e.g., to 30–50 dims) and then run t-SNE for stability and speed.
29 Is t-SNE deterministic? âš¡ Beginner
Answer: No, results vary with random initialization and parameter settings; fixing the random seed improves reproducibility.
30 Is t-SNE suitable as a preprocessing step for clustering? 🔥 Advanced
Answer: Generally no; t-SNE is optimized for visualization, not for preserving cluster geometry needed by clustering algorithms.
31 What does it mean if t-SNE shows well-separated clusters? 📊 Intermediate
Answer: It often indicates that the classes or groups have distinct local neighborhoods in high-dimensional space, but it’s not a rigorous proof.
32 How does t-SNE differ from PCA? 📊 Intermediate
Answer: PCA is a linear, global variance-based method; t-SNE is non-linear and local-neighborhood based, optimized for visualization.
33 Why can t-SNE be slow on large datasets? 🔥 Advanced
Answer: It needs to compute and optimize over pairwise similarities, though approximate and Barnes–Hut variants help scale it up.
34 Which hyperparameters typically require tuning in t-SNE? 📊 Intermediate
Answer: Mainly perplexity, learning rate, number of iterations and sometimes initialization method.
35 What is a typical perplexity range used in practice? âš¡ Beginner
Answer: Values between 5 and 50 are common; trying a few and comparing plots is recommended.
36 How can you misuse t-SNE in an analysis? 🔥 Advanced
Answer: Misuse includes over-interpreting distances and cluster sizes, not checking stability across runs, or using it as evidence of separability without other metrics.
37 Is t-SNE appropriate for streaming or online data? 🔥 Advanced
Answer: Not really; it’s batch-oriented and doesn’t provide a simple incremental update rule for new points.
38 Give a real-world use case where t-SNE is very helpful. âš¡ Beginner
Answer: t-SNE is widely used to visualize embeddings like word vectors, image features or latent representations from neural networks.
39 How can you check if a t-SNE result is robust? 📊 Intermediate
Answer: Re-run t-SNE with different random seeds and parameter settings; stable qualitative patterns increase confidence.
40 What is the key message to remember about t-SNE? âš¡ Beginner
Answer: t-SNE is a powerful visualization tool, not a general-purpose feature extractor; use it to explore structure, but validate findings with other methods.