Efficient Emotion Classification Method Based on Multimodal Approach Using Limited Speech and Text Data 


Vol. 13,  No. 4, pp. 174-180, Apr.  2024
https://doi.org/10.3745/TKIPS.2024.13.4.174


PDF
  Abstract

In this paper, we explore an emotion classification method through multimodal learning utilizing wav2vec 2.0 and KcELECTRA models. It is known that multimodal learning, which leverages both speech and text data, can significantly enhance emotion classification performance compared to methods that solely rely on speech data. Our study conducts a comparative analysis of BERT and its derivative models, known for their superior performance in the field of natural language processing, to select the optimal model for effective feature extraction from text data for use as the text processing model. The results confirm that the KcELECTRA model exhibits outstanding performance in emotion classification tasks. Furthermore, experiments using datasets made available by AI-Hub demonstrate that the inclusion of text data enables achieving superior performance with less data than when using speech data alone. The experiments show that the use of the KcELECTRA model achieved the highest accuracy of 96.57%. This indicates that multimodal learning can offer meaningful performance improvements in complex natural language processing tasks such as emotion classification.

  Statistics


  Cite this article

[IEEE Style]

M. Shin and Y. Shin, "Efficient Emotion Classification Method Based on Multimodal Approach Using Limited Speech and Text Data," The Transactions of the Korea Information Processing Society, vol. 13, no. 4, pp. 174-180, 2024. DOI: https://doi.org/10.3745/TKIPS.2024.13.4.174.

[ACM Style]

Mirr Shin and Youhyun Shin. 2024. Efficient Emotion Classification Method Based on Multimodal Approach Using Limited Speech and Text Data. The Transactions of the Korea Information Processing Society, 13, 4, (2024), 174-180. DOI: https://doi.org/10.3745/TKIPS.2024.13.4.174.