Computer Vision (CV)
Computer vision is a field of artificial intelligence that enables computers to extract meaningful information from digital images and video, and to act on what they find.
What is Computer Vision (CV)?
Human beings take vision for granted. You glance at a street scene and instantly know which shapes are cars, which one is a traffic light, and that the figure stepping off the curb is a person. To a computer, that same scene is nothing but a grid of numbers — millions of pixels, each just a value for brightness and color. Computer vision is the field of artificial intelligence devoted to closing that gap: turning those raw pixels into a useful interpretation of what an image or video contains, and increasingly into decisions based on it.
For decades this was painfully hard. Early computer vision relied on engineers hand-writing rules — look for this edge, that corner, this change in brightness — which worked in controlled conditions and fell apart the moment lighting, angle, or background changed. The breakthrough came with deep learning, and in particular a specialized type of neural network called a convolutional neural network (CNN) — one that learns to spot visual patterns like edges, shapes, and textures and build them up into recognizable objects. Instead of being told what to look for, these systems learn it by example: show them enough labeled pictures and they work out for themselves how to move from edges and textures in the early layers to whole objects in the later ones. With the right architecture, enough data, and enough computing power, the results pulled far ahead of anything the rule-writing era could manage.
Today computer vision is woven through everyday life, often invisibly. It reads the license plate that lifts the gate at a parking garage, helps a car stay in its lane, checks products for defects as they race down a production line, lets a drone spot crop disease across a field, scans printed documents into editable text, and assists doctors in noticing details on a medical scan that are easy to miss. It still has real weaknesses — unfamiliar angles, poor lighting, or situations unlike anything in its training data can all throw it off — but it works on still images and live video alike, and it turns up in more places every year. More recently, computer vision has begun merging with language models into multimodal systems that can not only spot what is in an image but describe and discuss it in plain language.
Real-world example
Pull up to the entrance of a modern parking garage and the gate lifts without you taking a ticket. A camera above the lane has photographed your license plate, a computer vision system has read the characters off that image in a fraction of a second, and the barrier has checked the number against a list of permitted or pre-paid vehicles. There is no special sensor in your car and no human watching a screen — just a camera and a model trained to turn a picture of a plate into text the system can act on, even in rain, glare, or at an awkward angle.
Related terms
Frequently asked questions
What is the difference between computer vision and image recognition?
Image recognition — naming what appears in a picture — is one task inside the much larger field of computer vision. Computer vision also covers working out where objects are and drawing boxes around them (known as object detection), separating an image into meaningful regions, tracking movement across frames of video, and even reconstructing 3D shape from flat images. So image recognition is a useful slice of computer vision, not another word for it.
How does computer vision work?
An image enters the system as a grid of pixel values, and a neural network trained on a large collection of labeled images turns those numbers into a prediction — "this is a bicycle", "there is a pedestrian here". The network was never handed a written description of a bicycle; it learned the visual patterns by being corrected over millions of examples until it became reliably accurate. Once trained, it applies what it learned to images it has never seen before.
What is computer vision used for?
A great many things, often so routine you stop noticing them. Cars use it to read road signs and stay in lane, hospitals use it to help analyze scans, factories use it to catch defective products, farmers use it to spot crop disease from the air, and your phone uses it every time it scans a document or reads text from a photo. The common thread is simple: wherever a camera can take over or assist the work of human eyes, computer vision tends to follow.