The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

This paper describes the development of the NICT/ATR speech synthesizer for the Blizzard Challenge 2008 and discuss the official results. The submitted system is based on the hidden Markov model speech synthesis technology and utilizes an improved excitation approach based on residual modeling, in order to remove artifacts related to the parametric way in which speech is synthesized. Although development time was limited, the results show that the system in question achieves good performance in terms of naturalness and intelligibility. Index Terms: speech synthesis, statistical parametric speech synthesis, Blizzard Challenge.

[1]  Keiichi Tokuda,et al.  XIMERA: a new TTS from ATR based on corpus-based technologies , 2004, SSW.

[2]  Heiga Zen,et al.  A trainable excitation model for HMM-based speech synthesis , 2007, INTERSPEECH.

[3]  Eric Moulines,et al.  High-quality speech modification based on a harmonic + noise model , 1995, EUROSPEECH.

[4]  Simon King,et al.  Statistical analysis of the Blizzard Challenge 2007 listening test results , 2007 .

[5]  Keiichi Tokuda,et al.  "Developing a Test Bed of English Text-to-Speech System XIMERA for the Blizzard Challenge 2006 for the Blizzard Challenge 2006" , 2006 .

[6]  Yoshinori Sagisaka,et al.  ATR μ-talk speech synthesis system , 1992, ICSLP.

[7]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Heiga Zen,et al.  Hidden Semi-Markov Model Based Speech Synthesis System , 2006 .

[9]  Heiga Zen,et al.  AN HMM-BASED SPEECH SYNTHESIS SYSTEM APPLIED TO ENGLISH , 2003 .

[10]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[11]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[12]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  Heiga Zen,et al.  An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005 , 2005, INTERSPEECH.

[14]  Heiga Zen,et al.  An excitation model for HMM-based speech synthesis based on residual modeling , 2007, SSW.