Benchmark for Measuring Intersectional Bias in Race and Gender of Large Language Models Using
Synthetic Data
Vol. 14, No. 2, pp. 82-94,
Feb. 2025
https://doi.org/10.3745/TKIPS.2025.14.2.82
PDF
Abstract
Large Language Models (LLMs) are generative AI systems trained on vast datasets, capable of producing human-like text and widely
used across industries. However, LLMs risk generating biased outputs by internalizing stereotypes related to race and gender, potentially
exacerbating social inequalities. To address this, researchers design datasets with queries that elicit biased responses to quantitatively
evaluate LLM bias. Despite progress, existing studies face two key limitations. First, most datasets rely on manual curation of crawled
data, restricting scalability and diversity in bias evaluation scenarios. Second, research on intersectional bias—arising from interactions
between domains such as race and gender—is limited, as most studies focus on single-domain biases. This approach, while insightful,
fails to capture the complexities of real-world, multidimensional biases. This study introduces a large-scale dataset of 16,082 entries
to evaluate intersectional biases in race and gender within LLMs. Using U.S. labor and population statistics, we analyzed occupational
distributions and their associated race-gender combinations, defining Pro-stereotype (aligned with societal stereotypes) and Anti-stereotype
(counter to stereotypes) categories. Positive and negative contexts were systematically constructed for each occupation's race-gender
pairing, and the DPSDA2 synthetic data generation method was applied to expand scenario coverage. The dataset consists of
multiple-choice items, each with one Pro-stereotype sentence and seven Anti-stereotype sentences, enabling quantitative bias evaluation
based on LLM responses. This work addresses the limitations of existing studies, offering a scalable and comprehensive framework for
assessing intersectional biases in LLMs.
Statistics
Cite this article
[IEEE Style]
L. Jueun and H. Bae, "Benchmark for Measuring Intersectional Bias in Race and Gender of Large Language Models Using
Synthetic Data," The Transactions of the Korea Information Processing Society, vol. 14, no. 2, pp. 82-94, 2025. DOI: https://doi.org/10.3745/TKIPS.2025.14.2.82.
[ACM Style]
Lee Jueun and Ho Bae. 2025. Benchmark for Measuring Intersectional Bias in Race and Gender of Large Language Models Using
Synthetic Data. The Transactions of the Korea Information Processing Society, 14, 2, (2025), 82-94. DOI: https://doi.org/10.3745/TKIPS.2025.14.2.82.