Home - News RSS feed - Lobachevsky University researchers use a machine learning approach to analyze stress in human speech

Researchers from the Department of Cyberpsychology at the UNN Faculty of Social Sciences are developing machine learning models to detect anxiety based on acoustic features. Determining stress and anxiety levels in speech has important applications in education, mental health, and human–computer interaction. In a professional environment, insufficient stress management contributes to employee burnout and decreased productivity.

"Automatic stress detection by voice provides a tool for early detection of overloads – it helps to detect vulnerable conditions in operators, flight controllers and medical staff in a timely manner, reducing the risk of errors and burnout. It also allows us to record the client's condition, which, for example, can be useful for detecting fraud – when a client is misled and asks the bank to perform a suspicious operation," explains Valeria Demareva, Head of the Cyberpsychology Department at the UNN Faculty of Social Sciences.

Stress manifests itself actively in speech: the autonomic nervous system increases muscle tone and respiratory rate, which can result in a more «rigid» or trembling voice, as well as changes in speech rhythm and timbre. Consequently, pitch, volume (intensity), and speech rate will change.

For the study, a machine learning pipeline based on mel-frequency cepstral coefficients (MFCCs) was used. MFCCs were chosen because they compactly and accurately describe the spectral envelope of speech, are resistant to noise after normalization, demonstrate good discriminatory ability for speech styles and emotional states, and perform well on small datasets, making them a reliable and interpretable basis for a pilot study. It is assumed that these coefficients can provide stable stress classification, and combining them with other spectral features enhances accuracy.

Lobachevsky University scientists conducted a pilot experiment comparing speech recordings under two conditions: stress-inducing public presentations and private rehearsals. Acoustic features (primarily MFCCs) were extracted, their differences assessed, and the machine learning model was trained using MFCCs and its quality was evaluated.

To study vocal changes associated with stress in academic speech, ten Lobachevsky University students specializing in cyberpsychology prepared a segment of their scientific presentation (4-6 minutes) and delivered it in two situations: publicly, addressing a commission and colleagues in a classroom, and privately, in a quiet office without an audience. All recordings were standardized to 16 kHz and mono WAV format.

After signal cleaning, each four-minute recording was divided into non-overlapping five-second segments, resulting in 565 segments for private recordings and 569 segments for public presentations. Following thorough signal cleaning and MFCC extraction, the Gradient Boosting machine classifier showed an accuracy of 91.9% in distinguishing anxiety in speech based on these features. Out of 110 private segments, 102 were correctly classified, and out of 111 public segments, 101 were correctly identified. Errors were evenly distributed without systematic bias toward one class.

Valeria Demareva comments: "The approximately 92% accuracy under controlled conditions is encouraging, but it is largely due to careful preprocessing and sample homogeneity. This does not guarantee the same stability in real heterogeneous data. In our study, we plan to expand the sample, validate the results, add dynamic and prosodic features, and implement consistent architectures and domain adaptation methods."

The research was funded by the Russian Science Foundation, and the results were published on the Springer Nature Link platform.

Earlier this year, the UNN Cyberpsychology Department released the first Russian textbook on cyberpsychology.