Data Wrangling

Data Wrangling Interview Q&A

1What is data wrangling?
Answer: Transforming raw data into usable, analysis-ready format.
2Wrangling vs cleaning?
Answer: Cleaning fixes quality; wrangling includes reshaping, joining, and feature-ready transformation.
3What is tidy data?
Answer: Each variable column, each observation row, each value cell.
4Wide vs long format?
Answer: Wide stores repeated measures across columns; long stacks them in rows.
5Why keys matter in joins?
Answer: Correct keys prevent duplication and incorrect row matching.
6How handle schema drift?
Answer: Add schema checks and transformation mapping by source version.
7What is feature engineering in wrangling?
Answer: Creating informative variables from raw inputs.
8Why type casting important?
Answer: Wrong dtypes cause calculation errors and inefficient memory use.
9How validate joins?
Answer: Compare pre/post row counts and key uniqueness diagnostics.
10How manage pipeline steps?
Answer: Make each transform modular, testable, and idempotent.
11What is idempotent transform?
Answer: Re-running it produces same result without side effects.
12Wrangling in one line?
Answer: Wrangling bridges messy sources and reliable analytical outputs.