Mathematics & Statistics Foundation75 Q&A

Mathematics & Statistics Foundation — Q&A

Linear algebra, calculus, probability, descriptive and inferential statistics for data science.

Linear Algebra Basics for Data Science – Q&A

1 What is a vector and what is a matrix in the context of Data Science? easy

Answer: A vector is an ordered list of numbers (e.g., features of one data point). A matrix is a 2‑D table of numbers, often representing many data points (rows) and many features (columns). Most tabular datasets are naturally represented as matrices, and many ML operations are expressed as matrix multiplications.

2 What is the dot product and why is it important in ML? medium

Answer: The dot product of vectors \(x\) and \(w\) is \(x \cdot w = \sum_i x_i w_i\). Geometrically it relates to the cosine of the angle between vectors; in ML it appears in linear models (e.g., \(w^Tx\)), similarity measures, and in the computation inside many neural network layers.

3 Intuitively, what are eigenvalues and eigenvectors and how do they relate to PCA? hard

Answer: For a matrix \(A\), an eigenvector \(v\) is a direction that is only scaled (not rotated) by \(A\), and the scale factor is the corresponding eigenvalue \(\lambda\) (i.e. \(Av = \lambda v\)). In PCA, eigenvectors of the covariance matrix give the principal directions of variance, and eigenvalues tell you how much variance each principal component explains.

4 What is matrix multiplication in DS terms?easy

Answer: It combines linear transformations. In ML, multiplying feature matrix X by weight vector w gives predictions for linear models.

5 Why does shape compatibility matter in matrix operations?easy

Answer: Multiplication A(m×n)·B(n×p) is valid only when inner dimensions match. Shape errors often indicate pipeline bugs in feature or batch handling.

6 What is matrix transpose and where is it used?easy

Answer: Transpose flips rows and columns. It appears in normal equations, covariance calculations, and gradient derivations (e.g., XᵀX).

7 What is rank of a matrix?medium

Answer: Rank is the number of linearly independent rows/columns. Low rank indicates redundant features and can cause instability in regression.

8 What does matrix inverse represent?medium

Answer: For invertible A, A⁻¹ reverses the transformation A. In practice, exact inversion is often avoided for numerical stability; decompositions are preferred.

9 Why is determinant conceptually useful?medium

Answer: Determinant indicates volume scaling and whether a matrix is singular. det(A)=0 implies non-invertible matrix and dependent columns.

10 What is orthogonality and why do we care?medium

Answer: Orthogonal vectors are perpendicular (dot product zero). Orthogonal features/components reduce redundancy and simplify optimization and interpretation.

11 What is vector norm?easy

Answer: Norm is vector magnitude. L2 norm is common for distance/similarity; L1/L2 norms also appear in regularization (Lasso/Ridge).

12 How is cosine similarity connected to dot product?medium

Answer: Cosine similarity = (x·y)/(||x|| ||y||). It measures direction similarity independent of absolute scale; widely used in retrieval/embedding tasks.

13 What is SVD and where is it useful?hard

Answer: Singular Value Decomposition factors A into UΣVᵀ. It is useful for dimensionality reduction, denoising, latent factor models, and numerical robustness.

14 Why do we standardize features before PCA?medium

Answer: PCA is variance-based; large-scale features dominate components. Standardization ensures each feature contributes comparably.

15 One-line linear algebra summary for DS interviews?easy

Answer: Linear algebra is the language of data representation and transformations—vectors, matrices, and decompositions power model training, optimization, and dimensionality reduction.

Calculus Interview Q&A for Data Science

16Why is calculus important in Machine Learning?easy

Answer: Calculus helps us optimize model parameters by measuring how loss changes with respect to each parameter.

17What is a derivative?easy

Answer: A derivative gives the instantaneous rate of change (slope) of a function at a point.

18What is partial derivative?easy

Answer: It measures change in a multivariable function with respect to one variable while others are fixed.

19What is a gradient vector?medium

Answer: The gradient is a vector of partial derivatives pointing to the direction of steepest increase.

20How is gradient descent connected to calculus?medium

Answer: Gradient descent uses derivatives to move parameters in the opposite direction of gradient and reduce loss.

21What is the chain rule?easy

Answer: For nested functions, derivative of outer and inner are multiplied; this powers backpropagation in neural networks.

22What are critical points?medium

Answer: Points where derivative is zero or undefined; they can be minima, maxima, or saddle points.

23What does second derivative indicate?medium

Answer: Curvature. Positive second derivative indicates local convexity; negative indicates concavity.

24Why are convex functions easier to optimize?medium

Answer: Any local minimum is a global minimum, so optimization is more stable and predictable.

25What is learning rate in optimization?easy

Answer: It is the step size in parameter updates; too high diverges, too low learns slowly.

26What is vanishing gradient problem?hard

Answer: Gradients become extremely small through deep layers, slowing or stopping learning in early layers.

27What is exploding gradient problem?hard

Answer: Gradients become very large, causing unstable updates and numerical overflow.

28How does regularization relate to calculus updates?medium

Answer: Regularization adds penalty terms to loss, changing derivatives so weights are constrained during optimization.

29Why do we normalize features for gradient-based models?medium

Answer: Normalization improves conditioning of loss surface, helping gradient descent converge faster.

30One-line calculus summary for DS interviews?easy

Answer: Calculus gives the optimization mechanics that let ML models learn from data efficiently.

Probability Interview Q&A for Data Science

31What is probability?easy

Answer: Probability quantifies uncertainty, ranging from 0 (impossible) to 1 (certain).

32What is a random variable?easy

Answer: A random variable maps outcomes of a random process to numeric values.

33Difference between PMF and PDF?medium

Answer: PMF is for discrete variables; PDF is for continuous variables where area under the curve gives probability.

34What is expectation (mean)?easy

Answer: Expected value is the long-run average outcome of a random variable.

35What is variance?easy

Answer: Variance measures spread around the mean; standard deviation is its square root.

36What does Bayes theorem state?medium

Answer: Posterior ∝ Likelihood × Prior. It updates beliefs when new evidence arrives.

37What are independent events?easy

Answer: Events are independent if occurrence of one does not change probability of the other.

38What is conditional probability?easy

Answer: Probability of A given B: P(A|B)=P(A∩B)/P(B), assuming P(B)>0.

39When do we use normal distribution?medium

Answer: For many naturally aggregated phenomena and as an approximation via the Central Limit Theorem.

40What is CLT?medium

Answer: Central Limit Theorem: sample mean tends toward normal distribution as sample size increases.

41Binomial vs Poisson?medium

Answer: Binomial models fixed number of Bernoulli trials; Poisson models count of events in interval with average rate λ.

42Why is likelihood different from probability?hard

Answer: Probability treats parameters as fixed and data as random; likelihood treats observed data fixed and parameters variable.

43What is prior vs posterior?medium

Answer: Prior is belief before data; posterior is updated belief after observing data using Bayes theorem.

44How does probability help in classification?medium

Answer: Models estimate class probabilities so decisions can be thresholded based on risk/cost trade-offs.

45One-line probability summary for interviews?easy

Answer: Probability provides the mathematical framework for uncertainty, inference, and model confidence in Data Science.

Descriptive Statistics Interview Q&A

46What is descriptive statistics?easy

Answer: It summarizes and describes dataset properties through metrics and visualizations.

47Mean vs median?easy

Answer: Mean is arithmetic average; median is middle value and more robust to outliers.

48What is mode?easy

Answer: Mode is the most frequently occurring value in a dataset.

49What does range measure?easy

Answer: Range is max minus min, a simple dispersion measure.

50Variance vs standard deviation?medium

Answer: Variance is average squared deviation; standard deviation is its square root in original units.

51What is IQR?medium

Answer: Interquartile Range is Q3−Q1 and captures spread of middle 50% data.

52What is five-number summary?medium

Answer: Minimum, Q1, median, Q3, and maximum used in boxplots.

53What is skewness?medium

Answer: Skewness measures asymmetry; positive skew has long right tail, negative skew long left tail.

54What is kurtosis?medium

Answer: Kurtosis measures tail heaviness and peak sharpness relative to normal distribution.

55How do you detect outliers quickly?medium

Answer: Use boxplot/IQR rule or z-score thresholds depending on distribution assumptions.

56When prefer median over mean?easy

Answer: When data is skewed or has outliers, median better represents central tendency.

57What is coefficient of variation?medium

Answer: CV = std/mean; it compares variability across datasets with different scales.

58Why do histograms matter in EDA?easy

Answer: They reveal shape, spread, skewness, and potential multimodality of data.

59How does aggregation help business reporting?easy

Answer: Aggregated stats transform raw records into actionable summaries for decisions.

60One-line descriptive stats summary?easy

Answer: Descriptive statistics compress large datasets into interpretable patterns and health checks.

Inferential Statistics Interview Q&A

61What is inferential statistics?easy

Answer: It uses sample data to make conclusions about a larger population.

62What is a population vs sample?easy

Answer: Population is full set; sample is a subset used for analysis.

63What is sampling bias?medium

Answer: It occurs when sample is not representative, leading to misleading inference.

64What is confidence interval?medium

Answer: A range of plausible parameter values with an associated confidence level (e.g., 95%).

65What is null hypothesis?easy

Answer: Baseline claim (H0), often representing no effect or no difference.

66What is alternative hypothesis?easy

Answer: Competing claim (H1/Ha) that there is an effect or difference.

67What is p-value?medium

Answer: Probability of seeing data at least as extreme as observed if H0 were true.

68What is significance level (alpha)?easy

Answer: Threshold for rejecting H0, commonly 0.05.

69Type I and Type II errors?medium

Answer: Type I: reject true H0 (false positive). Type II: fail to reject false H0 (false negative).

70What is test power?medium

Answer: Probability of correctly rejecting a false null hypothesis (1−beta).

71When to use t-test?medium

Answer: Compare means when sample size is small and/or population variance unknown.

72When to use chi-square test?medium

Answer: For categorical data: goodness-of-fit or independence between categories.

73What is ANOVA?medium

Answer: It tests whether means of 3 or more groups differ significantly.

74Why statistical significance is not practical significance?hard

Answer: A tiny effect can be statistically significant with large samples but still have low business impact.

75One-line inferential stats summary?easy

Answer: Inferential statistics turns sample evidence into defensible decisions under uncertainty.

Previous Next