Augmented Reality (AR) in its current definition is the overlay of digital information on a real-world view. In practical terms, it is the process of recognising specific physical objects in a device’s camera and superimposing digital content such as video, audio or 3D models.
Visual recognition is one aspect of AR which encompasses image, object, scene and facial recognition. Computer vision technology is used to identify shapes and patterns through a complicated set of mathematical models. These models and processes are all facets of Machine Learning (ML) that drive Artificial Intelligence (AI).
ML is the science of “teaching” the system to look for commonalities and patterns and assessing the probability that a match is found. Effectively, with a set of mathematical models in place, the system is fed a collection of information that represents a positive match. For instance, if we want to teach the system to identify a cat, we provide thousands of images of cats and let the system process and find common visual patterns across all the images.
This is known as deep learning where the outcome is a system that can recognise and track almost any pattern. With this capability, we can inject a virtual projection into the area that is being recognised and tracked to deliver, what is called, an augmented reality experience.
The power of AI and ML is being able to make decisions based on the real-world scenario. Let’s consider its application in a security surveillance system. A machine that has been trained to detect weapons, such as knives and guns, can be used to observe CCTV security vision. In real-time, it can look for patterns in the scene that resemble its definition of a weapon. If identified, a notification alarm could be raised for someone to act.
Pattern recognition is not limited to visual only. Auditory, gesture and other data patterns can also be “taught” using ML. Continuing with our security surveillance example, a trained machine could be used to listen to sounds in the environment and detect patterns of shouting or offensive language being used.
One of the hurdles in training a machine to identify patterns is sourcing enough material that is deemed a “positive match”. In these cases, systems are designed with feedback loops to allow machines to “learn by experience”. If for some reason the machine fails to detect what it is supposed to, it can be taught what was missing in the initial dataset and be trained to act on it the next time it occurs. All this is supported by an aspect of ML called “convolutional neural networks”. Different nodes that perform specific mathematical functions on the dataset are interconnected to achieve the specified outcome.
In a time when vast amounts of information is available at our fingertips, being able to recognise the world around us and decipher what is relevant will become critical. Whether at work, at home or in a social setting, successful real-world augmentation will rely on AI and ML observing and recognising our environment and adapting information to match our situation.
As hardware technology improves and wearable, handsfree devices become a reality, ML and AR will become an integral, yet ambient part of our lives.
click here to watch making of B-AIM