Impact of Utterance Length and Augmentation on Spoofed Sppech Detection 


Vol. 14,  No. 12, pp. 997-1003, Dec.  2025
10.3745/TKIPS.2025.14.12.997


PDF
  Abstract

ASVspoof 5 is the fifth edition of the ASVspoof challenge, one of the largest global audio security challenges, aiming to promote the development of Countermeasure(CM) models by distinguishing between genuine and spoofed speech. In this study, we investigate the impact of data augmentation and utterance length on spoofed speech detection(SSD) using pretrained speech models. XLSR, WavLM, and HuBERT are used as feature extractors, and a dual-branch network proposed in previous studies is also used. To evaluate robustness, five data augmentation techniques and three different utterance lengths are tested. Most augmentation methods degrade performance, while Low Frequency Mask augmentation achieves an EER of 6.36% and a min-DCF of 0.1676. Experiments on utterance length show that a 8-second duration yields the best performance. The results demonstrate that both augmentation strategies and utterance duration have a significant impact on SSD performance. These findings provide insights into the factors affecting robustness in ASVspoof 5-based spoofed speech detection.

  Statistics


  Cite this article

[IEEE Style]

G. Hwang, M. Seok, W. Kim, "Impact of Utterance Length and Augmentation on Spoofed Sppech Detection," The Transactions of the Korea Information Processing Society, vol. 14, no. 12, pp. 997-1003, 2025. DOI: 10.3745/TKIPS.2025.14.12.997.

[ACM Style]

Gyuhan Hwang, MinJe Seok, and Wooseong Kim. 2025. Impact of Utterance Length and Augmentation on Spoofed Sppech Detection. The Transactions of the Korea Information Processing Society, 14, 12, (2025), 997-1003. DOI: 10.3745/TKIPS.2025.14.12.997.