HMM-based Unit Selection Using F

This paper presents a hidden Markov model (HMM) based unit selection method for concatenative speech synthesis system. Frame sized waveform segments are adopted as basic synthesis units here to increase the coverage rate of candidate units and the chance of finding appropriate ones. In training stage, a set of contextual dependent HMMs are trained with static and dynamic acoustic features. When synthesizing a sentence, the optimal frame sequence is searched out from speech corpus by maximizing the output probability of a sentence HMM constructed according to the contextual information of input text. Listening test proves that proposed method can achieve better performance of synthesized speech compared with the method using state sized units and cost function criterion.

[1]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Alex Acero,et al.  Recent improvements on Microsoft's trainable text-to-speech system-Whistler , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Robert E. Donovan,et al.  The IBM trainable speech synthesis system , 1998, ICSLP.

[4]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[5]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Sridha Sridharan,et al.  Trainable speech synthesis with trended hidden Markov models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Kishore Prahallad,et al.  Unit size in unit selection speech synthesis , 2003, INTERSPEECH.

[8]  Toshio Hirai,et al.  Using 5 ms segments in concatenative speech synthesis , 2004, SSW.

[9]  Shinsuke Sakai,et al.  A probabilistic approach to unit selection for corpus-based speech synthesis , 2005, INTERSPEECH.