Q&A40 Questions

Dimensionality Reduction — Q&A

PCA, t-SNE, and techniques to reduce feature space for ML.

Principal Component Analysis (PCA): Interview Q&A

1 What problem does PCA solve? âš¡ Beginner

Answer: PCA reduces the dimensionality of data while retaining as much variance (information) as possible.

2 What are principal components? âš¡ Beginner

Answer: Principal components are new orthogonal axes (directions) in feature space along which the data has maximum variance.

3 Why should features be standardized before applying PCA? ðŸ“Š Intermediate

Answer: Without scaling, PCA would be dominated by features with larger numeric ranges, since variance is scale-dependent.

4 How is PCA related to eigenvalues and eigenvectors? ðŸ”¥ Advanced

Answer: PCA computes the eigenvectors and eigenvalues of the covariance matrix; eigenvectors are component directions, eigenvalues measure captured variance.

5 What is explained variance ratio in PCA? ðŸ“Š Intermediate

Answer: It is the fraction of total variance captured by each principal component, used to decide how many components to keep.

6 Is PCA supervised or unsupervised? âš¡ Beginner

Answer: PCA is an unsupervised technique; it ignores labels and focuses only on the feature covariance structure.

7 Can PCA improve model performance? ðŸ“Š Intermediate

Answer: Sometimesâ€”by reducing noise, multicollinearity and overfitting, it can help some models, but it may also remove useful information.

8 How do you decide how many principal components to keep? ðŸ“Š Intermediate

Answer: Common methods: keep enough components to explain a target proportion of variance (e.g., 95%) or inspect the scree plot for an elbow.

9 Is PCA good for interpretability of original features? ðŸ“Š Intermediate

Answer: Not usuallyâ€”components are linear combinations of all features, which can be hard to interpret directly.

10 How is PCA useful for visualization? âš¡ Beginner

Answer: PCA can project high-dimensional data down to 2D or 3D, making cluster and structure visualization easier.

11 Does PCA assume linear relationships? ðŸ”¥ Advanced

Answer: Yes, PCA captures linear correlations; it may miss complex non-linear structure without extensions like kernel PCA.

12 How does PCA relate to the covariance matrix? ðŸ”¥ Advanced

Answer: PCA finds directions (eigenvectors) that diagonalize the covariance matrix, concentrating variance along principal axes.

13 When might PCA hurt model performance? ðŸ“Š Intermediate

Answer: When important predictive information lies in low-variance directions or when interpretability of original features is critical.

14 Should PCA be fit on training data only or on the full dataset? ðŸ“Š Intermediate

Answer: Fit PCA on the training data only to avoid information leakage, then apply the learned transform to validation/test data.

15 Is PCA sensitive to outliers? ðŸ”¥ Advanced

Answer: Yes, outliers can significantly affect the covariance matrix and distort components; robust PCA variants exist.

16 Can PCA be used before clustering? ðŸ“Š Intermediate

Answer: Yes, PCA is often applied to reduce dimensionality and noise before clustering algorithms like k-means.

17 How does kernel PCA differ from standard PCA? ðŸ”¥ Advanced

Answer: Kernel PCA uses the kernel trick to perform PCA in a high-dimensional feature space, capturing non-linear structure.

18 Give a real-world use case where PCA is helpful. âš¡ Beginner

Answer: PCA is used for image compression, noise reduction, exploratory analysis and visualizing high-dimensional datasets.

19 Does PCA keep class separability in supervised problems? ðŸ”¥ Advanced

Answer: Not necessarily; PCA optimizes for variance, not class separation, so supervised methods like LDA might be better for that goal.

20 What is the key message to remember about PCA? âš¡ Beginner

Answer: PCA is a powerful linear tool for compression and visualization; always scale features, avoid leakage, and check that the lost dimensions arenâ€™t crucial for your task.

t-SNE: Interview Q&A

21 What is t-SNE mainly used for? âš¡ Beginner

Answer: t-SNE is used for visualizing high-dimensional data in 2D or 3D while preserving local neighborhood structure.

22 Is t-SNE a linear or non-linear method? âš¡ Beginner

Answer: t-SNE is a non-linear dimensionality reduction technique.

23 What does t-SNE try to preserve when reducing dimensions? ðŸ“Š Intermediate

Answer: It aims to preserve local neighbor relationships by matching pairwise similarity distributions in high and low dimensions.

24 What is perplexity in t-SNE? ðŸ”¥ Advanced

Answer: Perplexity is a parameter roughly related to the effective number of neighbors considered for each point.

25 How does the learning rate affect t-SNE? ðŸ”¥ Advanced

Answer: Too small a learning rate leads to slow convergence; too large can cause points to crowd or diverge.

26 Why is t-SNE primarily an exploratory tool, not a general-purpose feature reducer? ðŸ“Š Intermediate

Answer: t-SNE is non-parametric, stochastic and focuses on visualization; it doesnâ€™t provide a simple mapping for new points and distortions can be hard to interpret quantitatively.

27 Is the global structure in a t-SNE plot always reliable? ðŸ”¥ Advanced

Answer: Not necessarily; t-SNE is designed to preserve local structure, so global distances and cluster sizes can be misleading.

28 Should you run t-SNE on raw features or after a step like PCA? ðŸ“Š Intermediate

Answer: Often you first apply PCA to reduce dimensionality (e.g., to 30â€“50 dims) and then run t-SNE for stability and speed.

29 Is t-SNE deterministic? âš¡ Beginner

Answer: No, results vary with random initialization and parameter settings; fixing the random seed improves reproducibility.

30 Is t-SNE suitable as a preprocessing step for clustering? ðŸ”¥ Advanced

Answer: Generally no; t-SNE is optimized for visualization, not for preserving cluster geometry needed by clustering algorithms.

31 What does it mean if t-SNE shows well-separated clusters? ðŸ“Š Intermediate

Answer: It often indicates that the classes or groups have distinct local neighborhoods in high-dimensional space, but itâ€™s not a rigorous proof.

32 How does t-SNE differ from PCA? ðŸ“Š Intermediate

Answer: PCA is a linear, global variance-based method; t-SNE is non-linear and local-neighborhood based, optimized for visualization.

33 Why can t-SNE be slow on large datasets? ðŸ”¥ Advanced

Answer: It needs to compute and optimize over pairwise similarities, though approximate and Barnesâ€“Hut variants help scale it up.

34 Which hyperparameters typically require tuning in t-SNE? ðŸ“Š Intermediate

Answer: Mainly perplexity, learning rate, number of iterations and sometimes initialization method.

35 What is a typical perplexity range used in practice? âš¡ Beginner

Answer: Values between 5 and 50 are common; trying a few and comparing plots is recommended.

36 How can you misuse t-SNE in an analysis? ðŸ”¥ Advanced

Answer: Misuse includes over-interpreting distances and cluster sizes, not checking stability across runs, or using it as evidence of separability without other metrics.

37 Is t-SNE appropriate for streaming or online data? ðŸ”¥ Advanced

Answer: Not really; itâ€™s batch-oriented and doesnâ€™t provide a simple incremental update rule for new points.

38 Give a real-world use case where t-SNE is very helpful. âš¡ Beginner

Answer: t-SNE is widely used to visualize embeddings like word vectors, image features or latent representations from neural networks.

39 How can you check if a t-SNE result is robust? ðŸ“Š Intermediate

Answer: Re-run t-SNE with different random seeds and parameter settings; stable qualitative patterns increase confidence.

40 What is the key message to remember about t-SNE? âš¡ Beginner

Answer: t-SNE is a powerful visualization tool, not a general-purpose feature extractor; use it to explore structure, but validate findings with other methods.

Previous Next