Pandas Q&A 20 Core Questions
Interview Prep

Pandas for ML & Data Analysis: Interview Q&A

Short questions and answers on pandas: Series, DataFrames, indexing, joins, grouping and time series handling.

DataFrames Filtering GroupBy Merge/Join
1 What are the two main data structures in pandas? ⚑ Beginner
Answer: Series (1D labeled array) and DataFrame (2D labeled table of columns).
2 How do you read a CSV file into a DataFrame? ⚑ Beginner
Answer: Using pd.read_csv("file.csv").
3 What is the difference between loc and iloc? πŸ“Š Intermediate
Answer: loc uses label-based indexing; iloc uses integer position-based indexing.
4 How do you filter rows based on a condition? ⚑ Beginner
Answer: Use boolean indexing, e.g., df[df["col"] > 0].
5 What does groupby do in pandas? πŸ“Š Intermediate
Answer: It groups rows by key(s) and lets you aggregate, transform or filter each group separately.
6 How do you handle missing values in pandas? ⚑ Beginner
Answer: With isna()/notna(), dropna() to remove, fillna() to impute with constants or statistics.
7 How can you join/merge two DataFrames? πŸ“Š Intermediate
Answer: Use pd.merge() or DataFrame.merge() with left/right/inner/outer join types.
8 How do you set and reset an index in a DataFrame? ⚑ Beginner
Answer: Use set_index() to set and reset_index() to move the index back to a column.
9 What is a multi-index and when is it useful? πŸ”₯ Advanced
Answer: A MultiIndex has multiple levels of index labels, useful for hierarchical data like (country, year).
10 How do you apply a function row-wise or column-wise? πŸ“Š Intermediate
Answer: Use df.apply(func, axis=1) for rows or axis=0 for columns; prefer vectorized operations when possible.
11 What is the difference between copy() and a simple assignment in pandas? πŸ”₯ Advanced
Answer: Simple assignment can create views that share data; copy() creates an independent copy, avoiding chained assignment issues.
12 How do you work with time series in pandas? πŸ“Š Intermediate
Answer: Convert to DatetimeIndex, then use resample, rolling, shifting, asfreq and time-based slicing.
13 How can you quickly inspect a DataFrame’s structure? ⚑ Beginner
Answer: Use head(), tail(), info(), describe(), and df.dtypes.
14 How do you efficiently select numeric or categorical columns? πŸ”₯ Advanced
Answer: Use select_dtypes(include=["number"]) or similar for numeric; include "category","object" for categoricals.
15 How do you melt and pivot data in pandas? πŸ”₯ Advanced
Answer: melt() converts wide β†’ long; pivot/pivot_table convert long β†’ wide with aggregations.
16 How can pandas integrate with scikit-learn workflows? πŸ“Š Intermediate
Answer: Pandas is used for data loading, cleaning and feature engineering, then values are passed to sklearn models (or wrapped via pipelines that accept DataFrames).
17 When might you reach pandas performance limits? πŸ“Š Intermediate
Answer: With very large datasets (millions of rows) or heavy Python-level loops; then consider chunking, vectorization, or tools like Dask/Polars.
18 What are some common pitfalls in pandas that ML engineers should avoid? πŸ”₯ Advanced
Answer: Pitfalls: chained assignment bugs, mixing timezones silently, hidden type conversions, ignoring index alignment.
19 Give a real-world example where pandas is central in an ML project. ⚑ Beginner
Answer: Pandas is typically used in feature engineering pipelines for tabular ML, e.g., preparing customer or transaction datasets.
20 What is the key message to remember about pandas for ML? ⚑ Beginner
Answer: Pandas is the workhorse for tabular data; mastering indexing, grouping, joining and reshaping makes ML data preparation much easier.

Quick Recap: Pandas

If you can select, filter, group, join and reshape with confidence, you already have the core pandas skills needed for most ML workflows.