A Study on the Evaluation Methods for Assessing the Understanding of Korean Culture by Generative AI Models 


Vol. 13,  No. 9, pp. 421-428, Sep.  2024
https://doi.org/10.3745/TKIPS.2024.13.9.421


PDF
  Abstract

Recently, services utilizing large-scale language models (LLMs) such as GPT-4 and LLaMA have been released, garnering significant attention. These models can respond fluently to various user queries, but their insufficient training on Korean data raises concerns about the potential to provide inaccurate information regarding Korean culture and language. In this study, we selected eight major publicly available models that have been trained on Korean data and evaluated their understanding of Korean culture using a dataset composed of five domains (Korean language comprehension and cultural aspects). The results showed that the commercial model HyperClovaX exhibited the best performance across all domains. Among the publicly available models, Bookworm demonstrated superior Korean language proficiency. Additionally, the LDCC-SOLAR model excelled in areas related to understanding Korean culture and language.

  Statistics


  Cite this article

[IEEE Style]

S. K. Jun and K. S. Hyun, "A Study on the Evaluation Methods for Assessing the Understanding of Korean Culture by Generative AI Models," The Transactions of the Korea Information Processing Society, vol. 13, no. 9, pp. 421-428, 2024. DOI: https://doi.org/10.3745/TKIPS.2024.13.9.421.

[ACM Style]

Son Ki Jun and Kim Seung Hyun. 2024. A Study on the Evaluation Methods for Assessing the Understanding of Korean Culture by Generative AI Models. The Transactions of the Korea Information Processing Society, 13, 9, (2024), 421-428. DOI: https://doi.org/10.3745/TKIPS.2024.13.9.421.