Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network
Vol. 13, No. 6, pp. 284-290,
Jun. 2024
https://doi.org/10.3745/TKIPS.2024.13.6.284
PDF
Abstract
Speech emotion recognition (SER) is a technique that is used to analyze the speaker's voice patterns, including vibration, intensity,
and tone, to determine their emotional state. There has been an increase in interest in artificial intelligence (AI) techniques, which are
now widely used in medicine, education, industry, and the military. Nevertheless, existing researchers have attained impressive results
by utilizing acted-out speech from skilled actors in a controlled environment for various scenarios. In particular, there is a mismatch
between acted and spontaneous speech since acted speech includes more explicit emotional expressions than spontaneous speech. For
this reason, spontaneous speech-emotion recognition remains a challenging task. This paper aims to conduct emotion recognition and
improve performance using spontaneous speech data. To this end, we implement deep learning-based speech emotion recognition using
the VGG (Visual Geometry Group) after converting 1-dimensional audio signals into a 2-dimensional spectrogram image. The experimental
evaluations are performed on the Korean spontaneous emotional speech database from AI-Hub, consisting of 7 emotions, i.e., joy, love,
anger, fear, sadness, surprise, and neutral. As a result, we achieved an average accuracy of 83.5% and 73.0% for adults and young people
using a time-frequency 2-dimension spectrogram, respectively. In conclusion, our findings demonstrated that the suggested framework
outperformed current state-of-the-art techniques for spontaneous speech and showed a promising performance despite the difficulty in
quantifying spontaneous speech emotional expression.
Statistics
Cite this article
[IEEE Style]
G. Son and S. Kwon, "Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network," The Transactions of the Korea Information Processing Society, vol. 13, no. 6, pp. 284-290, 2024. DOI: https://doi.org/10.3745/TKIPS.2024.13.6.284.
[ACM Style]
Guiyoung Son and Soonil Kwon. 2024. Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network. The Transactions of the Korea Information Processing Society, 13, 6, (2024), 284-290. DOI: https://doi.org/10.3745/TKIPS.2024.13.6.284.