Mixed excitation for HMM-based speech synthesis

This paper describes improvements on the excitation model of an HMM-based text-to-speech system. In our previous work, natural sounding speech can be synthesized from trained HMMs. However, it has a typical quality of “vocoded speech” since the system uses a traditional excitation model with either a periodic impulse train or white noise. In this paper, in order to reduce the synthetic quality, a mixed excitation model used in MELP is incorporated into the system. Excitation parameters used in mixed excitation are modeled by HMMs, and generated from HMMs by a parameter generation algorithm in the synthesis phase. The result of a listening test shows that the mixed excitation model significantly improves quality of synthesized speech as compared with the traditional excitation model.

[1]  Satoshi Imai,et al.  Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.

[2]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Keiichi Tokuda,et al.  Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Keiichi Tokuda,et al.  Speaker adaptation for HMM-based speech synthesis system using MLLR , 1998, SSW.

[5]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[6]  Keiichi Tokuda,et al.  An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features , 1995, EUROSPEECH.

[7]  Soo Ngee Koh,et al.  Mixed excitation linear prediction coding of wideband speech at 8 kbps , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Keiichi Tokuda,et al.  Speaker interpolation in HMM-based speech synthesis system , 1997, EUROSPEECH.

[9]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Naofumi Aoki,et al.  Development of a rule-based speech synthesis system for the Japanese language using a MELP vocoder , 2000, 2000 10th European Signal Processing Conference.

[11]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[12]  Koichi Shinoda,et al.  Speaker adaptation with autonomous model complexity control by MDL principle , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.