Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network 


Vol. 13,  No. 6, pp. 284-290, Jun.  2024
https://doi.org/10.3745/TKIPS.2024.13.6.284


PDF
  Abstract

Speech emotion recognition (SER) is a technique that is used to analyze the speaker's voice patterns, including vibration, intensity, and tone, to determine their emotional state. There has been an increase in interest in artificial intelligence (AI) techniques, which are now widely used in medicine, education, industry, and the military. Nevertheless, existing researchers have attained impressive results by utilizing acted-out speech from skilled actors in a controlled environment for various scenarios. In particular, there is a mismatch between acted and spontaneous speech since acted speech includes more explicit emotional expressions than spontaneous speech. For this reason, spontaneous speech-emotion recognition remains a challenging task. This paper aims to conduct emotion recognition and improve performance using spontaneous speech data. To this end, we implement deep learning-based speech emotion recognition using the VGG (Visual Geometry Group) after converting 1-dimensional audio signals into a 2-dimensional spectrogram image. The experimental evaluations are performed on the Korean spontaneous emotional speech database from AI-Hub, consisting of 7 emotions, i.e., joy, love, anger, fear, sadness, surprise, and neutral. As a result, we achieved an average accuracy of 83.5% and 73.0% for adults and young people using a time-frequency 2-dimension spectrogram, respectively. In conclusion, our findings demonstrated that the suggested framework outperformed current state-of-the-art techniques for spontaneous speech and showed a promising performance despite the difficulty in quantifying spontaneous speech emotional expression.

  Statistics


  Cite this article

[IEEE Style]

G. Son and S. Kwon, "Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network," The Transactions of the Korea Information Processing Society, vol. 13, no. 6, pp. 284-290, 2024. DOI: https://doi.org/10.3745/TKIPS.2024.13.6.284.

[ACM Style]

Guiyoung Son and Soonil Kwon. 2024. Spontaneous Speech Emotion Recognition Based On Spectrogram With Convolutional Neural Network. The Transactions of the Korea Information Processing Society, 13, 6, (2024), 284-290. DOI: https://doi.org/10.3745/TKIPS.2024.13.6.284.