How is computer vision different from image processing?

Image processing focuses on transforming pixel data (filters, enhancement, compression). Computer vision uses those signals to interpret content: recognition, tracking, 3D understanding, and higher-level scene reasoning.

What are common applications of computer vision?

Examples include face recognition, medical imaging, autonomous driving perception, industrial inspection, augmented reality, optical character recognition (OCR), and video analytics.

What is Computer Vision? Tutorial — Definition, History & Applications

Q: What is computer vision?

Computer vision is a field of artificial intelligence that builds systems to extract information from images or video—such as objects, shapes, motion, or scene structure—often to make decisions or measurements automatically.

Definition

Computer vision (CV) is the area of artificial intelligence that deals with making computers understand visual data: digital images, video streams, or 3D sensor outputs. The goal is to recover useful information—objects, boundaries, motion, depth, text, or activities—and support decisions, measurements, or automation.

Unlike simply storing or displaying pictures, a CV system builds representations (features, regions, embeddings) and applies models (classical geometry, statistical learning, or deep neural networks) to interpret what the pixels mean in context.

Computer vision vs image processing

Image processing

Operates on pixels and signals: filtering, denoising, resizing, color transforms, compression, and enhancement. Output is usually another image or numeric map.

Computer vision

Aims at semantics: detection, recognition, segmentation, tracking, pose, 3D structure, and scene understanding—often combining image processing with learning and geometry.

Short history

1960s–70s: Early work on blocks world, edges, and simple 3D from images.
1990s–2000s: Robust features (e.g. SIFT), stereo, tracking, and statistical methods.
2012 onward: Deep convolutional networks (AlexNet on ImageNet) shifted many benchmarks to end-to-end learning.
Today: Large-scale detection and segmentation, transformers for vision, generative image models, and real-time embedded CV.

Applications

Computer vision appears in many industries:

Quality inspection and robotics
Medical imaging assistance
Autonomous driving and driver assistance
Security, retail analytics, and sports analytics
OCR, document scanning, and accessibility tools
AR/VR and content creation

How this series is organized

Later chapters cover preprocessing, features, segmentation, object detection, tracking, 3D vision, CNNs, generative models, video, applications, tools, and evaluation—aligned with the sidebar. Use the left menu to jump between topics; verify code and library versions on your machine.

Frequently asked questions

It is the study and engineering of systems that interpret visual data—images or video—to extract meaningful information and support tasks like recognition, measurement, or control.

Image processing transforms images; computer vision interprets them (what and where things are, how they move, etc.). In practice, both are often used together.

Face and object recognition, medical imaging, factory inspection, autonomous vehicles, OCR, augmented reality, and video understanding are common examples.