The development of computer technology and technology has led to a rapid growth in the number of digital images. There are many sources of such images – digital cameras and camcorders, video recorders, video surveillance cameras, mobile phones, scanners, etc.
Digital archives have appeared containing millions of images. With such volumes of data, it becomes important to find the desired images.
There are several directions of image search: search by content (find a photograph of a person or an image, for example, a birch), search by visual pattern (find images that are similar to a given one), search by descriptions (find an image marked as "Top Secret") etc. Each of the search directions has its own characteristics and scope.
AI-powered machine learning can solve many real-world problems. In the field of image search, machine learning has advanced particularly strongly.
The picture is divided into small sections, up to several pixels, each of which will be an input neuron. With the help of synapses, signals are transmitted from one layer to another. During this process, hundreds of thousands of neurons with millions of parameters compare the received signals with already processed data.
Simply put, if we ask the machine to recognize a photo of a cat, we will break the photo into small pieces and compare these layers with the millions of existing images of cats, the traits of which the network has learned.
At some point, an increase in the number of layers leads to simply memorizing the sample, and not learning.
In this method, a comparison is made with a certain database, where for each of the objects different display modification options are presented. For example, for optical pattern recognition, you can use the method of enumerating at different angles or scales, displacements, deformations, etc. For letters, you can enumerate the font or its properties.
This is the easiest extensional recognition method in practice. It is used when the recognized classes are shown as compact geometry classes. Then the center of the geometric grouping (or the object closest to the center) is selected as a point – the prototype.
Borrowed from the classical theory of statistical decisions, in which the objects of research are considered as realizations of a multidimensional random variable distributed in the feature space according to some law.
It requires either a huge number of examples of the recognition problem, or a special structure of a neural network that takes into account the specifics of this problem. But, nevertheless, this method is highly effective and efficient.
A neural network for image recognition is perhaps the most popular way to use neural networks. At the same time, regardless of the features of the tasks being solved, it works in stages, the most important of which we will consider below.
A wide variety of objects can act as recognizable images, including images, handwritten or printed text, sounds, and much more. When training a network, it is offered various samples with a label of what type they can be attributed to. As a sample, a vector of feature values is used, and the set of features under these conditions should make it possible to unambiguously determine which class of images the NN deals with.
When training, it is important to teach the network to determine not only a sufficient number and values of features to give good accuracy in new images, but also not to retrain, that is, not unnecessarily "adjust" to the training set from images. After completing the correct training, the NN should be able to identify images (of the same classes) with which she did not deal in the learning process.
It is important to take into account that the initial data for the neural network must be unambiguous and consistent, so that situations do not arise when the neural network will give out high probabilities of belonging of one object to several classes.
Several different architectures of artificial neural networks are distinguished, including neural networks for image recognition and search:
Constructed from 3+ layers and applies non-linear activation function to classify data.
Contains convolutional layers.
Deep neural network, which is formed by applying some sets of weights recursively over a structure for scalar or structured predictions.
A variant of the NN, where the connections between neurons are directed cycles.
Long short-term memory network is a kind of recurrent neural network that allows one to simulate time sequences as accurately as possible, as well as their characteristic long-term dependencies.
Consists of 2 recurrent neural networks that perform the functions of an encoder and decoder.
Also very popular, for example, groups of shallow two-layer models can be used to represent layers with vectors.
Recognition of images using neural networks is possible only through special training, which is a process aimed at setting the parameters of the neural network.
When teaching a neural network for pattern recognition with a teacher, there is a sample with true answers to the question that is shown in the picture - class labels. These images are fed into neural networks, after which an error is calculated comparing the output values with the true class labels. Depending on the degree and nature of the inconsistency in the prediction of the NN, its weights are corrected, the NN responses are adjusted to the true answers until the error becomes minimal.
In this case, the training set has no class labels, and the neural network is faced with the task of finding previously unknown answers. The neural network tries to independently find patterns in the data, extracting useful features and analyzing them. For example, clustering is the most common task for unsupervised learning. The algorithm picks up similar data, finds common features, and groups them together.
In unsupervised learning, it is difficult to calculate the accuracy of an algorithm because the data lacks "correct answers" or labels. But tagged data can be difficult or expensive to obtain. In such cases, giving the model the freedom to look for dependencies can get some results.
The training set contains both tagged and untagged data. This method is especially useful when marking up all objects is a tedious task. However, a neural network can extract information from a small fraction of labeled data and improve prediction accuracy compared to a model that trains exclusively on untagged data.
Reinforcement learning operates on the principle of receiving feedback – rewards for certain actions.