论文信息 - An evaluation of automatic phone segmentation for concatenative speech synthesis

An evaluation of automatic phone segmentation for concatenative speech synthesis

This paper studies the performance of automatic phone segmentation from two viewpoints: temporal precision and the effect on the naturalness of synthetic speech. The absolute error of the phone onset time for the best 90% and worst 10% were 4.6 ms and 25.9 ms, respectively. These values are comparable to discrepancies among human labelers. As the result of perception tests in which naturalness was pair-compared between synthetic speeches generated from hand-segmented data and from auto-segmented data, it was found that the latter is statistically inferior.

Tomoki Toda | Hisashi Kawai

[1] Tomoki Toda,et al. Perceptual evaluation of cost for segment selection in concatenative speech synthesis , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[2] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[3] Alan W. Black,et al. Evaluating and correcting phoneme segmentation for unit selection synthesis , 2003, INTERSPEECH.

[4] Steve Young,et al. The HTK book , 1995 .

[5] Andrej Ljolje,et al. Automatic segmentation of speech for TTS , 1993, EUROSPEECH.