Mathematics & Statistics Foundation75 Q&A

Mathematics & Statistics Foundation — Q&A

Linear algebra, calculus, probability, descriptive and inferential statistics for data science.

Linear Algebra Basics for Data Science – Q&A

1 What is a vector and what is a matrix in the context of Data Science? easy
Answer: A vector is an ordered list of numbers (e.g., features of one data point). A matrix is a 2‑D table of numbers, often representing many data points (rows) and many features (columns). Most tabular datasets are naturally represented as matrices, and many ML operations are expressed as matrix multiplications.
2 What is the dot product and why is it important in ML? medium
Answer: The dot product of vectors \(x\) and \(w\) is \(x \cdot w = \sum_i x_i w_i\). Geometrically it relates to the cosine of the angle between vectors; in ML it appears in linear models (e.g., \(w^Tx\)), similarity measures, and in the computation inside many neural network layers.
3 Intuitively, what are eigenvalues and eigenvectors and how do they relate to PCA? hard
Answer: For a matrix \(A\), an eigenvector \(v\) is a direction that is only scaled (not rotated) by \(A\), and the scale factor is the corresponding eigenvalue \(\lambda\) (i.e. \(Av = \lambda v\)). In PCA, eigenvectors of the covariance matrix give the principal directions of variance, and eigenvalues tell you how much variance each principal component explains.
4 What is matrix multiplication in DS terms?easy
Answer: It combines linear transformations. In ML, multiplying feature matrix X by weight vector w gives predictions for linear models.
5 Why does shape compatibility matter in matrix operations?easy
Answer: Multiplication A(m×n)·B(n×p) is valid only when inner dimensions match. Shape errors often indicate pipeline bugs in feature or batch handling.
6 What is matrix transpose and where is it used?easy
Answer: Transpose flips rows and columns. It appears in normal equations, covariance calculations, and gradient derivations (e.g., XᵀX).
7 What is rank of a matrix?medium
Answer: Rank is the number of linearly independent rows/columns. Low rank indicates redundant features and can cause instability in regression.
8 What does matrix inverse represent?medium
Answer: For invertible A, A⁻¹ reverses the transformation A. In practice, exact inversion is often avoided for numerical stability; decompositions are preferred.
9 Why is determinant conceptually useful?medium
Answer: Determinant indicates volume scaling and whether a matrix is singular. det(A)=0 implies non-invertible matrix and dependent columns.
10 What is orthogonality and why do we care?medium
Answer: Orthogonal vectors are perpendicular (dot product zero). Orthogonal features/components reduce redundancy and simplify optimization and interpretation.
11 What is vector norm?easy
Answer: Norm is vector magnitude. L2 norm is common for distance/similarity; L1/L2 norms also appear in regularization (Lasso/Ridge).
12 How is cosine similarity connected to dot product?medium
Answer: Cosine similarity = (x·y)/(||x|| ||y||). It measures direction similarity independent of absolute scale; widely used in retrieval/embedding tasks.
13 What is SVD and where is it useful?hard
Answer: Singular Value Decomposition factors A into UΣVᵀ. It is useful for dimensionality reduction, denoising, latent factor models, and numerical robustness.
14 Why do we standardize features before PCA?medium
Answer: PCA is variance-based; large-scale features dominate components. Standardization ensures each feature contributes comparably.
15 One-line linear algebra summary for DS interviews?easy
Answer: Linear algebra is the language of data representation and transformations—vectors, matrices, and decompositions power model training, optimization, and dimensionality reduction.

Calculus Interview Q&A for Data Science

16Why is calculus important in Machine Learning?easy
Answer: Calculus helps us optimize model parameters by measuring how loss changes with respect to each parameter.
17What is a derivative?easy
Answer: A derivative gives the instantaneous rate of change (slope) of a function at a point.
18What is partial derivative?easy
Answer: It measures change in a multivariable function with respect to one variable while others are fixed.
19What is a gradient vector?medium
Answer: The gradient is a vector of partial derivatives pointing to the direction of steepest increase.
20How is gradient descent connected to calculus?medium
Answer: Gradient descent uses derivatives to move parameters in the opposite direction of gradient and reduce loss.
21What is the chain rule?easy
Answer: For nested functions, derivative of outer and inner are multiplied; this powers backpropagation in neural networks.
22What are critical points?medium
Answer: Points where derivative is zero or undefined; they can be minima, maxima, or saddle points.
23What does second derivative indicate?medium
Answer: Curvature. Positive second derivative indicates local convexity; negative indicates concavity.
24Why are convex functions easier to optimize?medium
Answer: Any local minimum is a global minimum, so optimization is more stable and predictable.
25What is learning rate in optimization?easy
Answer: It is the step size in parameter updates; too high diverges, too low learns slowly.
26What is vanishing gradient problem?hard
Answer: Gradients become extremely small through deep layers, slowing or stopping learning in early layers.
27What is exploding gradient problem?hard
Answer: Gradients become very large, causing unstable updates and numerical overflow.
28How does regularization relate to calculus updates?medium
Answer: Regularization adds penalty terms to loss, changing derivatives so weights are constrained during optimization.
29Why do we normalize features for gradient-based models?medium
Answer: Normalization improves conditioning of loss surface, helping gradient descent converge faster.
30One-line calculus summary for DS interviews?easy
Answer: Calculus gives the optimization mechanics that let ML models learn from data efficiently.

Probability Interview Q&A for Data Science

31What is probability?easy
Answer: Probability quantifies uncertainty, ranging from 0 (impossible) to 1 (certain).
32What is a random variable?easy
Answer: A random variable maps outcomes of a random process to numeric values.
33Difference between PMF and PDF?medium
Answer: PMF is for discrete variables; PDF is for continuous variables where area under the curve gives probability.
34What is expectation (mean)?easy
Answer: Expected value is the long-run average outcome of a random variable.
35What is variance?easy
Answer: Variance measures spread around the mean; standard deviation is its square root.
36What does Bayes theorem state?medium
Answer: Posterior ∝ Likelihood × Prior. It updates beliefs when new evidence arrives.
37What are independent events?easy
Answer: Events are independent if occurrence of one does not change probability of the other.
38What is conditional probability?easy
Answer: Probability of A given B: P(A|B)=P(A∩B)/P(B), assuming P(B)>0.
39When do we use normal distribution?medium
Answer: For many naturally aggregated phenomena and as an approximation via the Central Limit Theorem.
40What is CLT?medium
Answer: Central Limit Theorem: sample mean tends toward normal distribution as sample size increases.
41Binomial vs Poisson?medium
Answer: Binomial models fixed number of Bernoulli trials; Poisson models count of events in interval with average rate λ.
42Why is likelihood different from probability?hard
Answer: Probability treats parameters as fixed and data as random; likelihood treats observed data fixed and parameters variable.
43What is prior vs posterior?medium
Answer: Prior is belief before data; posterior is updated belief after observing data using Bayes theorem.
44How does probability help in classification?medium
Answer: Models estimate class probabilities so decisions can be thresholded based on risk/cost trade-offs.
45One-line probability summary for interviews?easy
Answer: Probability provides the mathematical framework for uncertainty, inference, and model confidence in Data Science.

Descriptive Statistics Interview Q&A

46What is descriptive statistics?easy
Answer: It summarizes and describes dataset properties through metrics and visualizations.
47Mean vs median?easy
Answer: Mean is arithmetic average; median is middle value and more robust to outliers.
48What is mode?easy
Answer: Mode is the most frequently occurring value in a dataset.
49What does range measure?easy
Answer: Range is max minus min, a simple dispersion measure.
50Variance vs standard deviation?medium
Answer: Variance is average squared deviation; standard deviation is its square root in original units.
51What is IQR?medium
Answer: Interquartile Range is Q3−Q1 and captures spread of middle 50% data.
52What is five-number summary?medium
Answer: Minimum, Q1, median, Q3, and maximum used in boxplots.
53What is skewness?medium
Answer: Skewness measures asymmetry; positive skew has long right tail, negative skew long left tail.
54What is kurtosis?medium
Answer: Kurtosis measures tail heaviness and peak sharpness relative to normal distribution.
55How do you detect outliers quickly?medium
Answer: Use boxplot/IQR rule or z-score thresholds depending on distribution assumptions.
56When prefer median over mean?easy
Answer: When data is skewed or has outliers, median better represents central tendency.
57What is coefficient of variation?medium
Answer: CV = std/mean; it compares variability across datasets with different scales.
58Why do histograms matter in EDA?easy
Answer: They reveal shape, spread, skewness, and potential multimodality of data.
59How does aggregation help business reporting?easy
Answer: Aggregated stats transform raw records into actionable summaries for decisions.
60One-line descriptive stats summary?easy
Answer: Descriptive statistics compress large datasets into interpretable patterns and health checks.

Inferential Statistics Interview Q&A

61What is inferential statistics?easy
Answer: It uses sample data to make conclusions about a larger population.
62What is a population vs sample?easy
Answer: Population is full set; sample is a subset used for analysis.
63What is sampling bias?medium
Answer: It occurs when sample is not representative, leading to misleading inference.
64What is confidence interval?medium
Answer: A range of plausible parameter values with an associated confidence level (e.g., 95%).
65What is null hypothesis?easy
Answer: Baseline claim (H0), often representing no effect or no difference.
66What is alternative hypothesis?easy
Answer: Competing claim (H1/Ha) that there is an effect or difference.
67What is p-value?medium
Answer: Probability of seeing data at least as extreme as observed if H0 were true.
68What is significance level (alpha)?easy
Answer: Threshold for rejecting H0, commonly 0.05.
69Type I and Type II errors?medium
Answer: Type I: reject true H0 (false positive). Type II: fail to reject false H0 (false negative).
70What is test power?medium
Answer: Probability of correctly rejecting a false null hypothesis (1−beta).
71When to use t-test?medium
Answer: Compare means when sample size is small and/or population variance unknown.
72When to use chi-square test?medium
Answer: For categorical data: goodness-of-fit or independence between categories.
73What is ANOVA?medium
Answer: It tests whether means of 3 or more groups differ significantly.
74Why statistical significance is not practical significance?hard
Answer: A tiny effect can be statistically significant with large samples but still have low business impact.
75One-line inferential stats summary?easy
Answer: Inferential statistics turns sample evidence into defensible decisions under uncertainty.