Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing
Vol. 10, No. 6, pp. 665-672,
Oct. 2003
10.3745/KIPSTB.2003.10.6.665
PDF
Abstract
This paper describes a speech synthesis technique by concatenating unit phoneme. At that time, a major problem is that discontinuity is happened from connection part between unit phonemes, especially from connection part between unit phonemes recorded by different persons. To solve the problem, this paper uses clustered diphone, and proposes a spectral smoothing technique, not only using formant trajectory and distribution characteristic of spectrum but also reflecting human´s acoustic characteristic. That is, the proposed technique performs unit phoneme clustering using distribution characteristic of spectrum at connection part between unit phonemes and decides a quantity and a scope for the smoothing by considering human´s acoustic characteristic at the connection part of unit phonemes, and then performs the spectral smoothing using weights calculated along a time axes at the border of two diphones. The proposed technique removes the discontinuity and minimizes the distortion which can be occurred by spectrum smoothing. For the purpose of the performance evaluation, we test on five hundred diphones which are extracted from twenty sentences recorded by five persons, and show the experimental results.
Statistics
Cite this article
[IEEE Style]
J. H. Jong, K. G. Jung, K. G. Yeong, C. H. Il, "Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing," The KIPS Transactions:PartB , vol. 10, no. 6, pp. 665-672, 2003. DOI: 10.3745/KIPSTB.2003.10.6.665.
[ACM Style]
Jang Hyo Jong, Kim Gwan Jung, Kim Gye Yeong, and Choe Hyeong Il. 2003. Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing. The KIPS Transactions:PartB , 10, 6, (2003), 665-672. DOI: 10.3745/KIPSTB.2003.10.6.665.