Man vs. Machine – everything you need to know about computer vision

Computer vision is considered a new innovation in IT, but for aeons, people have dreamt of creating machines with the characteristics of human intelligence. Machines that can think and then act like human beings.

It’s been widely imagined in books and movies how these ideas will come to fruition in the future, giving computers the ability to come alive and “see” the world around them. But that was then…today we find ourselves in extraordinary times where machines are stepping out of our imaginations and becoming part of our everyday lives as they interpret the world around us – for us.

Read on to discover everything you need to know about computer vision and how fiction has today, become a reality.

What is computer vision?

Computer vision is a field of artificial intelligence that trains computers to interpret and understand our visual world. Using digital images from cameras and video deep learning models, computers can accurately identify and classify objects and ultimately react to what they “see.”

Science fiction becomes fact.

Computer vision is a computer science that focuses purely on imitating parts of the complex human visual system and then enabling computers to identify and process objects in images and videos in the same way that humans do.

What is deep learning?

To understand the recent process of computer vision technology, we need to dive into the algorithms that this technique relies on. Modern computer vision depends on deep learning, a specific division of machine learning, which uses algorithms to learn and improve over time from large data sets. On the other hand, machine learning relies on artificial intelligence, which acts as a foundation for both technologies.

Deep learning fits inside machine learning, a division of artificial intelligence.

Deep learning represents a more efficient way to aid computer vision—it uses a specific algorithm called a neural network. The neural networks are used to extract patterns from provided data samples. The algorithms have been inspired by the human understanding of how the brain functions, particularly, the interconnections between the vital neurons in the cerebral cortex.

Much in the same way that biologic neurons in the cerebral cortex connect and interchange signals, it’s possible to have several layers of interconnected perceptrons.

Let’s explain: input values (raw data) get passed through the network created by perceptrons and end up in the output layer, which is a prediction, or a highly educated guess about a certain object. Basically, by the end of the analysis, the machine can classify an object with a selected percentage of confidence.

How does computer vision work?

It may seem like a very far-out concept, but it’s beautifully simplistic in reality. Think about it this way: you have a factory that produces bottles. On one side you have a conveyer belt with all the bottles lined up, on the other side you have a person watching the belt to ensure the bottles are all being labelled. A computer can be placed in the right place to “watch” the bottles and ensure each one is being branded. The computer will then flag any bottles that are being labelled or if there is any obstruction or (pardon the pun) a bottleneck on the production line.  

One of the biggest questions in the concept of man vs. machine is how neuroscience and machine learning differ.

How do human brains work exactly, and how can we approach that with our own algorithms? In reality, is that there are limited comprehensive theories of brain computation; so although Neural Nets are supposed to “mimic the way the brain works,” nobody is quite sure if that’s actually the case.

The same paradox holds true for computer vision — since we’re undecided on how our brain and eyes process images, it’s hard to say how well the algorithms used in production approximate our own internal mental processes. There is no exact science to date, as mind-boggling as that may seem.

It is all about pattern recognition. One way to “train” a computer how to understand visual data is to feed it images, lots and lots of images! Thousands and millions if possible that have been labelled. After which, subjecting those to various software techniques, or algorithms, that allows the computer to decipher patterns in all the components that relate to those labels.

So, for example, if you feed a computer a million images of bottles, it will subject them all to algorithms that let them analyse the shapes, the colours, the distances between the shapes, where objects border each other in these images and so on. Soon it identifies a profile of what “bottle” means. When it’s finished, the computer will (in theory) be able to use its experience if fed other unlabelled images to find the ones that are of a bottle.

Obtaining an image

Images, even large sets, can be obtained in real-time through video, photos or 3D technology for analysis. At least in part, the dramatic improvement in the performance of Computer Vision algorithms has been helped by the ability to collect and process massive datasets – collecting a million images of faces was much harder before Instagram, Facebook, Twitter and even Google Search came along.

Processing the image

Deep learning models automate a large portion of this process, but the models are often trained by first being fed thousands of labelled or pre-identified images.

Understanding the image

The interpretative step is the final stage, where an object is identified or classified.

Today’s AI systems can go one step further and take actions based on the understanding of that particular image.

Many types used in different ways:

Image segmentation partitions an image into multiple regions or pieces to be examined separately.

Object detection means that a computer identifies a specific object in an image. Advanced object detection recognises many objects in a single image for example a cricket field, a bowler, a wicketkeeper, a bat, a ball and so on. These systems use an X, and Y coordinate to create a bounding box and identify everything inside the box.

Facial recognition is an advanced type of object detection that not only recognises a human face in an image but identifies a specific individual.

Image classification groups images into different categories.

Basic applications may only use one of these techniques, but more advanced uses, like computer vision for self-driving cars, depend on multiple techniques to accomplish their goal.

How does computer vision work in AI?

Think about how you approach a jigsaw puzzle. The pieces are all laid out in front of you, and you need to assemble them into an image. This is exactly how neural networks for computer vision work.

They distinguish many different pieces of the image, identify the edges and then model the subcomponents. Using filtering and a series of actions through deep network layers, they can piece all the parts of the image together, similar to how you would go about piecing a puzzle.

Computers assemble visual images in the same way you would go about piecing together a jigsaw puzzle.

The computer is fed hundreds of thousands of related images to train it to recognise specific objects.

How has computer vision evolved?

Until recently computer vision only worked in a limited capacity, but let’s rewind a bit.

As early as the 1950s, early experiments in computer vision took place. This was limited to detecting the edges of a particular object and to then sorting these objects into categories like squares and circles. The first commercial use interpreted typed or handwritten text using optical character recognition in the 1970s, a breakthrough for the visually impaired.

As the internet advanced in the 1990s, facial recognition programs flourished as more and more images became available online for analysis. The growing data make it possible for machines to now identify specific people in photos and videos.

Several factors have converged to bring in a new era of computer vision:

  • Built-in cameras in mobile technology have saturated the globe with photos and videos.
  • Computing power has become more easily accessible and affordable.
  • Computer vision hardware designed for analysis is now more widely available.
  • New algorithms like convolutional neural networks can take advantage of the hardware and software capabilities.

The effects of these advances on the computer vision field have been astonishing. Accuracy rates for object identification and classification have gone from 50 to 99 % in less than 10 years and yes, machines are taking over in some fields as today’s computers are more accurate than humans at quickly detecting and reacting to visual inputs.

Why is computer vision important?

A new era of cancer treatment

Traditional methods of assessing cancerous tumours are incredibly time-consuming. Based on a limited amount of data, such methods can lead to errors and misdiagnoses and they are prone to subjectivity. Through the use of computer vision technology doctors can identify cancer patients who are candidates for surgery much faster, and with lifesaving precision.

Self-driving cars

Computer vision enables cars to make sense of their immediate surroundings. A smart vehicle has multiple cameras that capture videos from all sorts of angles and send videos as an input signal to the computer vision software. The system processes the video in real-time and detects objects like pedestrians, cyclists or other cars, road markings, traffic lights, and so forth. The self-driving car can then steer its way on streets and highways, avoid hitting obstacles, and (hopefully) safely drive its passengers to their destination. One of the most notable examples of applications of this technology is autopilot in Tesla cars.

Advances in health care

Image information is a key element for diagnosis in medicine because it accounts for 90% of all medical data. Many analyses in health care are based on image processing, think; as X-rays, CT Scans, MRI, and mammography, to name but a few. And image segmentation proved its effectiveness during medical scans analysis. For instance, computer vision algorithms can detect diabetic retinopathy, the fastest-growing cause of blindness. Computer vision can process pictures of the back of the eye and rate them for disease presence as well as the severity thereof.

Facial recognition

Computer vision also plays an important role in facial recognition applications, the technology that enables computers to match images of people’s faces to their identities. Computer vision algorithms detect facial features in images and compare them with databases of face profiles. Consumer devices use facial recognition to authenticate the identities of their owners. Smartphones and social media apps use facial recognition to identify users as well as to detect and tag users. Law enforcement relies on facial recognition technology to identify criminals in video feeds.

Security alerts

Computer vision helps the security industry by alerting a breach in security. For example, an alarm can be activated when someone scales a wall or if something is detected on a motion monitor. Because security monitoring is an around the clock job, unlike human personnel, computer vision-based security systems can watch security footage tirelessly, monitor everyone in view, and identify patterns and any suspicious activity.

Augmented and mixed reality

Computer vision also plays an important role in augmented and mixed reality, the technology that enables computing devices such as smartphones, tablets and smart glasses to overlay and embed virtual objects on real-world imagery. Using computer vision, AR gear detects objects in the real world to determine the locations on a device’s display to place a virtual object. For instance, computer vision algorithms can help AR applications detect planes such as tabletops, walls and floors, a very important part of establishing depth and dimensions and placing virtual objects in the physical world.

Growing agriculture

Many agricultural organisations employ computer vision to monitor their harvests as well as solve common agricultural pitfalls such as nutrient deficiency or the emergence of weeds and pests. Computer vision systems process images from drones, satellites, helicopters and planes, in an attempt to detect the problems in the early phase, which aids in the avoidance of unnecessary financial losses.

From spotting defects in manufacturing to detecting early signs of plant disease in agriculture, computer vision is being used in more areas than you might expect.

Why use computer vision?

Computer vision is used across industries to enhance the consumer experience, reduce costs and increase security.

Where can we apply computer vision technology?

Some people think that computer vision is something from the distant future of design, this is not technically accurate. Computer vision is already being integrated into many areas of our daily lives.

Below are just a few significant examples of how we use this form of technology today:

Content organisation

Computer vision systems already help us organise our content. An excellent example is Apple Photos – the app has access to photo collections, and it automatically adds tags to photos and allows us to browse a more organised collection of these images. This in turn creates a feed of the best moments, especially for you.

Computer vision for animal conservation

Computer vision has even been designed to analyse animal tracks with computers being trained to identify an animal footprint much like a game tracker would. And even tell the difference of information to determine the animal as well as its gender.

Computer vision users in many industries are seeing real results…for example, did you know:

  • Computer vision can distinguish between real auto damage or whether auto damage has been staged?
  • Computer vision enables facial recognition for security applications?
  • Computer vision makes automatic checkout in modern retail stores possible.

Compute this:

Computer vision is one of the most remarkable things to come out of the deep learning and artificial intelligence realm. The advancements that deep learning has contributed to the computer vision field have really set this field apart. And this trend is not likely to slow down anytime soon.

Computer vision is a popular topic in articles about new technology and digital transformation. A different approach to using data is what makes this technology different. Tremendous amounts of data that we create daily, which some people think is a curse of our generation, are actually used for our benefit. This technology also demonstrates an important step that our society makes toward creating artificial intelligence that will benefit us all.

Talk to our experts