Transformer Based Korean Emotion Recognition Model through Multi-domain Fusion 


Vol. 14,  No. 6, pp. 459-467, Jun.  2025
https://doi.org/10.3745/TKIPS.2025.14.6.459


PDF
  Abstract

This study proposes a transformer-based emotion recognition model that enhances performance through multi-domain fusion. The proposed model extracts and compresses emotion-relevant information from three domains—audio, spectrogram, and text—using a feature encoder and a transformer encoder, thereby improving recognition accuracy. To evaluate the performance of the proposed model, experiments were conducted comparing different domain combinations and backbone architectures. The results demonstrate that all three domains effectively contribute to improved emotion recognition performance, and that ResNet50 is the most suitable backbone. The model trained on all three domains achieved an accuracy of 0.9306 and an F1-score of 0.9306, outperforming models trained on other domain combinations. These findings suggest that multi-domain fusion helps enhance the precision of emotion recognition and indicate that the proposed model can serve as a practical baseline for multimodal emotion recognition research.

  Statistics


  Cite this article

[IEEE Style]

J. Yang, H. Choi, N. Moon, J. Kim, "Transformer Based Korean Emotion Recognition Model through Multi-domain Fusion," The Transactions of the Korea Information Processing Society, vol. 14, no. 6, pp. 459-467, 2025. DOI: https://doi.org/10.3745/TKIPS.2025.14.6.459.

[ACM Style]

Jinhwan Yang, Hyuksoon Choi, Nammee Moon, and Jinah Kim. 2025. Transformer Based Korean Emotion Recognition Model through Multi-domain Fusion. The Transactions of the Korea Information Processing Society, 14, 6, (2025), 459-467. DOI: https://doi.org/10.3745/TKIPS.2025.14.6.459.