Application of the trended hidden Markov model to speech synthesis

This paper presents our work on a speech synthesis system that utilises the trended Hidden Markov Model to represent the basic synthesis unit. We draw upon both speech recognition and speech synthesis research to develop a system that is able to synthesise intelligible and natural sounding speech. Acoustic units are clustered using the decision tree technique and speech data corresponding to these clusters is used for the training of trended Hidden Markov Model synthesis units. The overall system has been implemented in a PSOLA synthesiser and the resultant speech has been compared with that produced by a conventional diphone synthesiser to yield very encouraging results.

[1]  Sridha Sridharan,et al.  Trainable speech synthesis with trended hidden Markov models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Alexander Kain,et al.  OGIresLPC: Diphone synthesizer using residual-excited linear prediction , 1997 .

[3]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[5]  Li Deng,et al.  A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal , 1992, Signal Process..

[6]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[7]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[8]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).