Computer vision (CV) is a field of artificial intelligence (AI) that uses machine learning to enable systems and computers to recognise and interpret the visual world. It replicates the complexity of human vision. Due to the advances in AI, the field has been able to match and sometimes exceed humans in some CV tasks related to image recognition and object detection and labelling.


What is Computer Vision?

Computer vision is a form of artificial intelligence (AI) that enables computers to see and understand the content of digital images such as photos and videos. It allows a computer to read its surroundings and identify things, similar to how human vision perceives things. They then use algorithms to collect pre-defined features of human vision and generate models and programs to simulate the abilities of human vision. This gives computers the ability to acquire, analyse and process visual information similarly to the way human vision does.

One of the most familiar implementations is facial recognition. Facial recognition is used to secure access to your mobile devices. The idea behind computer vision is to extract useful information from images and take appropriate action based on that information provided. It, in essence, replicates the human vision system for computers to mimic the work of humans. For simple mechanical tasks, this is not particularly difficult, but for complex tasks, the machine must be trained to visualise and interpret visual data.


Difference between Computer Vision and Human Vision


Humans see objects, scenes, patterns, and people as they are, like trees in a landscape, books on a shelf, people inside a taxi or keys on a laptop. Humans perceive the things as they are and retain what they recognise, storing it deep within the brain until they come across those items again. The brain and the eyes work hand in hand to compute these visuals without having to make deductions or requiring extra effort. The speed at which this interpretation happens is extremely fast and we do not even realise it is happening. CV, on the other hand, allows computers to interpret their surroundings and identify things, once a set of patterns and images have been implemented that the computers have been “trained” to recognise.


Human vision relies exclusively on our eyes and how they detect light patterns and coordinate with the brain to translate the light into the images that we see. The human eye is similar to a camera which needs light. When light hits the eyes, it forms a particular angle and the image is formed in the back of the eye, and the image is then inverted. Human vision requires coordination of the eye and the brain to function. Computer vision uses machine learning techniques and algorithms to identify, distinguish and classify objects by size or colour, and to discover and interpret patterns in visual data such as photos and videos. It simulates human vision by identifying objects in its field of vision.

Object Recognition

One of the key abilities of the human visual system is invariant object recognition, meaning humans can instantly and accurately identify objects in different variations. Humans recognise objects effortlessly and have no problems describing objects in a scene, even if they have never seen these objects before. The computer needs to extract a set of features from the image to produce descriptions of the image different from an array of pixel values. Recognizing 3D objects from a single 2D image is one of the most tricky problems in computer vision.


Is the way computer vision works similar to human vision?

The idea of computer vision itself is to give computers or machines the ability to acquire, analyse and process visual information just the way human vision does, and derive meaningful information from visual data.

What is the main difference between computer vision and computer graphics?

Both computer vision and computer graphics deal with visual information in different representations. However, computer graphics use 3D models to produce image data, while computer vision uses image data to produce 3D models.

