HMM-based Korean speech synthesis system for hand-held devices

Speech interface may be the first choice as a user interface for robots or hand-held devices such as personal digital assistants (PDAs) and portable multimedia players (PMPs). However, those devices have the limitation of the memory space and the computation power. The hidden Markov model (HMM)-based speech synthesis is presently considered to be suitable for the embedded systems. In this paper, our HMM-based Korean speech synthesis system is described. Statistical HMM models for Korean speech units are trained with the hand-labeled speech database including the contextual information about phoneme, word phrase, and multilevel break strength. Mel-cepstrum and line spectrum pair (LSP) are compared for the spectrum modeling, and two-band excitation based on the harmonic plus noise speech model is utilized for the mixed excitation source. The developed small-size Korean synthesis system produced considerably high quality speech with a fairly good prosody

[1]  Keiichi Tokuda,et al.  Spectral representation of speech based on mel‐generalized cepstral coefficients and its properties , 2000 .

[2]  John S. Collura,et al.  MELP: the new Federal Standard at 2400 bps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Keikichi Hirose,et al.  Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech , 2001 .

[4]  Soo Ngee Koh,et al.  Mixed excitation linear prediction coding of wideband speech at 8 kbps , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  France Mihelic,et al.  Evaluation of the Slovenian HMM-Based Speech Synthesis System , 2004, TSD.

[6]  Heiga Zen,et al.  Towards the development of a brazilian portuguese text-to-speech system based on HMM , 2003, INTERSPEECH.

[7]  Keiichi Tokuda,et al.  Mixed excitation for HMM-based speech synthesis , 2001, INTERSPEECH.

[8]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Yannis Stylianou,et al.  Modeling Speech Based on Harmonic Plus Noise Models , 2004, Summer School on Neural Networks.

[10]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[11]  Jong-Jin Kim,et al.  Implementation and Evaluation of an HMM-Based Korean Speech Synthesis System , 2006, IEICE Trans. Inf. Syst..

[12]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Abeer Alwan,et al.  Text to Speech Synthesis: New Paradigms and Advances , 2004 .

[14]  Minsoo Hahn,et al.  Two-Band Excitation for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[15]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[17]  Raymond N. J. Veldhuis,et al.  Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..

[18]  Keikichi Hirose,et al.  A New Korean Corpus-Based Text-to-Speech System , 2002, Int. J. Speech Technol..

[19]  H. Zen,et al.  An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..