Pandas for ML & Data Analysis: Interview Q&A

Short questions and answers on pandas: Series, DataFrames, indexing, joins, grouping and time series handling.

DataFrames Filtering GroupBy Merge/Join

1 What are the two main data structures in pandas? ⚡ Beginner

Answer: Series (1D labeled array) and DataFrame (2D labeled table of columns).

2 How do you read a CSV file into a DataFrame? ⚡ Beginner

Answer: Using pd.read_csv("file.csv").

3 What is the difference between loc and iloc? 📊 Intermediate

Answer: loc uses label-based indexing; iloc uses integer position-based indexing.

4 How do you filter rows based on a condition? ⚡ Beginner

Answer: Use boolean indexing, e.g., df[df["col"] > 0].

5 What does groupby do in pandas? 📊 Intermediate

Answer: It groups rows by key(s) and lets you aggregate, transform or filter each group separately.

6 How do you handle missing values in pandas? ⚡ Beginner

Answer: With isna()/notna(), dropna() to remove, fillna() to impute with constants or statistics.

7 How can you join/merge two DataFrames? 📊 Intermediate

Answer: Use pd.merge() or DataFrame.merge() with left/right/inner/outer join types.

8 How do you set and reset an index in a DataFrame? ⚡ Beginner

Answer: Use set_index() to set and reset_index() to move the index back to a column.

9 What is a multi-index and when is it useful? 🔥 Advanced

Answer: A MultiIndex has multiple levels of index labels, useful for hierarchical data like (country, year).

10 How do you apply a function row-wise or column-wise? 📊 Intermediate

Answer: Use df.apply(func, axis=1) for rows or axis=0 for columns; prefer vectorized operations when possible.

11 What is the difference between copy() and a simple assignment in pandas? 🔥 Advanced

Answer: Simple assignment can create views that share data; copy() creates an independent copy, avoiding chained assignment issues.

12 How do you work with time series in pandas? 📊 Intermediate

Answer: Convert to DatetimeIndex, then use resample, rolling, shifting, asfreq and time-based slicing.

13 How can you quickly inspect a DataFrame’s structure? ⚡ Beginner

Answer: Use head(), tail(), info(), describe(), and df.dtypes.

14 How do you efficiently select numeric or categorical columns? 🔥 Advanced

Answer: Use select_dtypes(include=["number"]) or similar for numeric; include "category","object" for categoricals.

15 How do you melt and pivot data in pandas? 🔥 Advanced

Answer: melt() converts wide → long; pivot/pivot_table convert long → wide with aggregations.

16 How can pandas integrate with scikit-learn workflows? 📊 Intermediate

Answer: Pandas is used for data loading, cleaning and feature engineering, then values are passed to sklearn models (or wrapped via pipelines that accept DataFrames).

17 When might you reach pandas performance limits? 📊 Intermediate

Answer: With very large datasets (millions of rows) or heavy Python-level loops; then consider chunking, vectorization, or tools like Dask/Polars.

18 What are some common pitfalls in pandas that ML engineers should avoid? 🔥 Advanced

Answer: Pitfalls: chained assignment bugs, mixing timezones silently, hidden type conversions, ignoring index alignment.

19 Give a real-world example where pandas is central in an ML project. ⚡ Beginner

Answer: Pandas is typically used in feature engineering pipelines for tabular ML, e.g., preparing customer or transaction datasets.

20 What is the key message to remember about pandas for ML? ⚡ Beginner

Answer: Pandas is the workhorse for tabular data; mastering indexing, grouping, joining and reshaping makes ML data preparation much easier.

Quick Recap: Pandas

If you can select, filter, group, join and reshape with confidence, you already have the core pandas skills needed for most ML workflows.

Back: Scikit-Learn Q&A Next: NumPy Q&A