Emotions play a huge role in human life and interpersonal communication. They can be expressed in various ways: facial expressions, posture, motor responses, voice and autonomic responses (heart rate, blood pressure, respiratory rate)
Emotion recognition is a hot topic in the field of artificial intelligence and machine learning. Emotion AI enables a computer to recognize, interpret and respond to human emotions. A camera, microphone, or wearable sensor reads a person's state, and a neural network processes the data to determine an emotion.
A person is put on a device that reads his pulse, body electrical impulses and other physiological indicators. Such technologies allow us to determine not only emotions, but also the level of stress or the likelihood of an epileptic seizure.
Emotions are analyzed on the basis of video and audio recordings. The computer learns facial expressions, gestures, eye movements, voice and speech.
To train a neural network, data scientists collect a sample of data and manually mark up a change in a person's emotional state. The program studies patterns and understands which signs belong to which emotions.
The neural network can be trained on different data. Some companies and laboratories use videotaping, others study voice, and some benefit from multiple sources. But the more diverse the data, the more accurate the result.
Emotions and speech are closely related and play a huge role in communication. In this regard, automatic and objective diagnostics of a person's emotional state by his speech is of great practical interest.
The ability to recognize emotions in speech is important both for examining speech and emotions itself, and for improving the quality of customer service, for example, in call centers.
Also, the identification of the emotional state is in demand in the telecommunications sector, in the entertainment industry, education, medicine and other areas.
The neural network extracts many parameters of the voice from the acoustic signal – for example, tone and rhythm. It analyzes change in time and determines the emotional state of the speaker.
Sometimes a spectrogram is used for training – an image that shows the strength and frequency of a signal over time. In addition, the AI analyzes vocabulary for a more accurate result.
Emotional AI can be useful in work with staff. It helps to determine the state of the employee, to notice his fatigue or dissatisfaction in time, and to redistribute tasks more efficiently.
In addition, technology helps with recruiting. With the help of emotional AI, you can check a candidate for a job or catch a lie during an interview. The applicant goes through a video interview, and the neural network determines his condition by keywords, voice intonation, movements and facial expressions.
The AI highlights the characteristics that are important for the job and gives grades, and the HR manager selects the right candidates.
The customer was Vodafone – the company asked to do a pilot project on the topic of recognizing emotions from calls from their call center subscribers in order to understand how satisfied customers are with the responses of the support staff.
The system not only evaluates the emotions of the interlocutor, but also takes into account the content of the conversation based on the text recording, assessing the degree of customer satisfaction in a comprehensive manner.
In this case, we used text recognition and text generation methods based on audio recordings. The case was more of a research and experimental character. The task was to start using the technology for commercial purposes based on the results of research and experiment, but the results obtained were not enough for this.
Nevertheless, an intermediate result has been achieved – data and experience have been obtained that can be applied in future developments.