论文信息 - Quality prediction for synthesized speech: Comparison of approaches

Quality prediction for synthesized speech: Comparison of approaches

Text-To-Speech (TTS) technology has reached a level of maturity which seems to be sufficient for a number of telephony applications. In order to assess TTS quality, system developers need to carry out auditory tests where participants are asked to transcribe what they have heard or to rate certain aspects of the auditory event, see e.g. [1]. To overcome the temporal and financial effort involved in auditory testing, it is desirable to estimate the quality on the basis of speech signals, i.e. on the basis of instrumental measurements.

Sebastian Möller | Tiago H. Falk

[1] Sebastian Möller,et al. Towards Signal-Based Instrumental Quality Diagnosis for Text-to-Speech Systems , 2008, IEEE Signal Processing Letters.

[2] Jithendra Vepa. OBJECTIVE DISTANCE MEASURES FOR SPECTRAL DISCONTINUITIES IN CONCATENATIVE SPEECH SYNTHESIS , 2002 .

[3] S. King,et al. Improving Instrumental Quality Prediction Performance for the Blizzard Challenge , 2008 .

[4] J. Berger,et al. P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Hu Peng,et al. An objective measure for estimating MOS of synthesized speech , 2001, INTERSPEECH.

[6] Milos Cernak,et al. An Evaluation of Synthetic Speech Using the PESQ Measure , 2005 .

[7] Sebastian Möller,et al. An instrumental measure for end-to-end speech transmission quality based on perceptual dimensions: framework and realization , 2008, INTERSPEECH.

[8] Sebastian Möller,et al. Estimating the Quality of Synthesized and Natural Speech Transmitted Through Telephone Networks Using Single-ended Prediction Models , 2008 .