Vol. 14, No. 10, pp. 764-774,
Oct. 2025
https://doi.org/10.3745/TKIPS.2025.14.10.764
PDF
Abstract
This paper presents an automatic speech analysis system that segments syllables, classifies pitch ranges, and analyzes key acoustic
features in repeated utterances of beginner-level vocalists. The system combines timestamp-based coarse segmentation with pitch-based
fine segmentation, which enables precise alignment of repeated utterances (t2–t4) by applying consistent boundaries derived from a
reference utterance (t1). Within each coarse segment defined by timestamps, pitch transitions are detected to determine fine-grained
syllable boundaries. The average pitch of each syllable is used to classify it into low (<220 Hz), mid (220-349 Hz), or high (≥350 Hz)
pitch categories, and the resulting segments are automatically named and stored accordingly. To validate the system, a dataset was
constructed from over 130 voluntary participants. Based on breath control, pitch/rhythm accuracy, and high-pitch ability, participants
were categorized into three performance groups, and the lowest-performing group (10 male participants in their 20s, with no more than
one year of vocal training) was selected for controlled experiments. For performance evaluation, from these participants, 1,153 manually
segmented syllables and 3,560 automatically segmented syllables were obtained. Although the participant pool was small, four repeated
performances of a designated song segment yielded a large and diverse dataset. Each participant recorded the same song four times
(t1–t4). Compared to expert manual segmentation, the proposed system achieved an average deviation within ±5% and a Pearson correlation
coefficient above 0.95 across all acoustic measures (Pitch, F1, F2, Intensity). These results demonstrate that the system can provide real-time,
quantitative analysis of unstable pitch, timing, and breathing patterns in beginner vocalists, supporting practical use in vocal training
and assessment.
Statistics
Cite this article
[IEEE Style]
L. H. Woo and K. Y. Han, "Implementation of an Automatic Vocal Diagnosis System Based on Acoustic Indicators," The Transactions of the Korea Information Processing Society, vol. 14, no. 10, pp. 764-774, 2025. DOI: https://doi.org/10.3745/TKIPS.2025.14.10.764.
[ACM Style]
Lee Hyun Woo and Kim Young Han. 2025. Implementation of an Automatic Vocal Diagnosis System Based on Acoustic Indicators. The Transactions of the Korea Information Processing Society, 14, 10, (2025), 764-774. DOI: https://doi.org/10.3745/TKIPS.2025.14.10.764.