Perpetually optimizing the cost function for unit selection in a TTS system with one single run of MOS evaluation

This paper proposes a method for optimizing the cost function for unit selection in the corpus-based TTS system by maximizing the correlation between the concatenative cost and the MOS. To do this, a subjective evaluation should be done first. The key point is to log the contextual information of all units appearing in the synthetic utterances evaluated. With this log file, concatenative cost can be recalculated with a cost function in any new definition. Then, the correlation between cost and MOS can serve as a measure for the validity of any change in the cost function, and the cost function is optimized perpetually without any new MOS evaluation. In this paper, the correlation coefficient between cost and MOS improves from –0.822 to –0.897 after optimization.

[1]  Hu Peng,et al.  Selecting non-uniform units from a very large corpus for concatenative speech synthesizer , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Aruna Bayya,et al.  Objective measures for speech quality assessment in wireless communications , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  Wonho Yang,et al.  Performance of current perceptual objective speech quality measures , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[4]  Hu Peng,et al.  An objective measure for estimating MOS of synthesized speech , 2001, INTERSPEECH.

[5]  Irina Cotanis Speech quality evaluation for mobile networks , 2000, 2000 IEEE International Conference on Communications. ICC 2000. Global Convergence Through Communications. Conference Record.

[6]  Ericsson NetQual Speech Quality Evaluation for Mobile Networks , 2000 .

[7]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[8]  Hu Peng,et al.  Domain adaptation for TTS systems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.