Computer Vision Interview 20 essential Q&A Updated 2026
Autonomous

Autonomous Vehicles (CV): 20 Essential Q&A

Multi-sensor perception, real-time constraints, and validation for self-driving stacks.

~12 min read 20 questions Advanced
lanesdetectionfusionLiDAR
1 What does perception do in AVs? ⚡ easy
Answer: Estimate drivable space, lanes, traffic actors, signs, and hazards from sensors to support planning and control.
2 Camera vs LiDAR vs radar? 📊 medium
Answer: Camera: rich semantics, cheap; LiDAR: accurate range, weather limits; radar: velocity, robust weather—stacks often fuse all three.
3 Sensor fusion levels? 🔥 hard
Answer: Early (raw/feature), object-level, late decision fusion—trade calibration complexity vs robustness to single-sensor failure.
4 Lane detection? 📊 medium
Answer: Segmentation masks, polynomial fits, or transformer lanes in BEV—must handle markings, merges, and construction zones.
5 Segmentation use? 📊 medium
Answer: Drivable area, road vs sidewalk, freespace for parking—often multi-class at high resolution with temporal smoothing.
6 Monocular depth? 📊 medium
Answer: Supplement LiDAR in camera-only tiers or dense depth for fusion—learned depth can fail on unseen textures.
7 Detection classes? ⚡ easy
Answer: Vehicles, pedestrians, cyclists, traffic lights/signs—need range, velocity hooks for tracker and planner.
8 Tracking role? 📊 medium
Answer: Maintain stable IDs, smooth boxes, predict future motion—critical for collision avoidance and behavior prediction.
9 HD maps? 🔥 hard
Answer: Centimeter lane geometry, semantics—anchor localization; mapless stacks push more burden onto online perception.
10 Calibration? 📊 medium
Answer: Extrinsics drift, vibration—online self-calibration vs factory; bad cal breaks fusion and projection.
11 Weather / night? 📊 medium
Answer: Sensor degradation, glare, spray—domain adaptation, multi-sensor redundancy, conservative ODD restrictions.
12 Functional safety (concept)? 🔥 hard
Answer: ISO 26262 mindset: fault detection, redundancy, validated perception uncertainty for ASIL-rated paths—not just model accuracy.
13 Simulation? 📊 medium
Answer: CARLA, NVIDIA DRIVE Sim—scale rare scenarios; sim-to-real gap remains a research and validation topic.
14 Long-tail objects? 📊 medium
Answer: Debris, animals, unusual vehicles—need active learning, fleet logging, and conservative planner reactions.
15 Occlusion? ⚡ easy
Answer: Pedestrians between cars—temporal reasoning, bird’s-eye fusion, and prediction to “see” briefly hidden actors.
16 Latency budgets? 📊 medium
Answer: End-to-end perception often tens of ms—tensorRT, sparse models, ROI processing; planner assumes aged observations.
17 Bird’s-eye view models? 🔥 hard
Answer: Lift image features to 3D/BEV grid (LSS, transformers) for consistent multi-camera reasoning—popular in modern detectors.
# BEV: lift 2D features to bird's-eye grid for fusion
18 What is ODD? ⚡ easy
Answer: Operational design domain—where the system is validated to operate; leaving ODD requires disengagement or human takeover.
19 Annotation? 📊 medium
Answer: LiDAR cuboids, polyline lanes, radar association—expensive; weak labels and self-supervision reduce cost.
20 End-to-end driving? 🔥 hard
Answer: Direct sensor→control learning challenges interpretability and safety case—hybrid stacks dominate production today.

AV Perception Cheat Sheet

Sensors
  • Cam + LiDAR + radar
Tasks
  • Lanes / det / track
Ops
  • ODD + latency

💡 Pro tip: Fusion + temporal tracking; never ignore ODD and safety case.

Full tutorial track

Go deeper with the matching tutorial chapter and code examples.