TY - JOUR T1 - A Study on the Evaluation Methods for Assessing the Understanding of Korean Culture by Generative AI Models AU - Jun, Son Ki AU - Hyun, Kim Seung JO - The Transactions of the Korea Information Processing Society PY - 2024 DA - 2024/2/28 DO - https://doi.org/10.3745/TKIPS.2024.13.9.421 KW - LLM KW - Korean Culture KW - Culture Understanding KW - Evaluation Dataset AB - Recently, services utilizing large-scale language models (LLMs) such as GPT-4 and LLaMA have been released, garnering significant attention. These models can respond fluently to various user queries, but their insufficient training on Korean data raises concerns about the potential to provide inaccurate information regarding Korean culture and language. In this study, we selected eight major publicly available models that have been trained on Korean data and evaluated their understanding of Korean culture using a dataset composed of five domains (Korean language comprehension and cultural aspects). The results showed that the commercial model HyperClovaX exhibited the best performance across all domains. Among the publicly available models, Bookworm demonstrated superior Korean language proficiency. Additionally, the LDCC-SOLAR model excelled in areas related to understanding Korean culture and language.