HMM-Based Korean Speech Synthesizer with Two-Band Mixed Excitation Model for Embedded Applications

Speech interface may be the first choice as a user interface for robots or hand-held devices such as personal digital assistants (PDAs) and portable multimedia players (PMPs). However, those devices have the limitation of the memory space and the computation power. The hidden Markov model (HMM)-based speech synthesis is presently considered to be suitable for the embedded systems. This thesis describes an HMM-based Korean speech synthesis, spectral parameter comparison, and the suggested two-band excitation model for the HMM-based speech synthesis. Firstly, development of an HMM-based Korean speech synthesis system and its evaluation is presented. Statistical HMM models for Korean speech units are trained with the hand-labeled speech database includ-

[1]  John S. Collura,et al.  MELP: the new Federal Standard at 2400 bps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Keikichi Hirose,et al.  Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech , 2001 .

[4]  Howard C. Blue,et al.  Chapter 7. , 2007 .

[5]  Raymond N. J. Veldhuis,et al.  Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..

[6]  Keikichi Hirose,et al.  A New Korean Corpus-Based Text-to-Speech System , 2002, Int. J. Speech Technol..

[7]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[9]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  H. Zen,et al.  An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[11]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[12]  Keiichi Tokuda,et al.  Spectral representation of speech based on mel‐generalized cepstral coefficients and its properties , 2000 .

[13]  Keiichi Tokuda,et al.  Multi-Space Probability Distribution HMM , 2002 .

[14]  Yannis Stylianou,et al.  Modeling Speech Based on Harmonic Plus Noise Models , 2004, Summer School on Neural Networks.

[15]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[16]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[17]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[18]  Peter Kabal,et al.  The computation of line spectral frequencies using Chebyshev polynomials , 1986, IEEE Trans. Acoust. Speech Signal Process..

[19]  Keiichi Tokuda,et al.  Speech synthesis using HMMs with dynamic features , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20]  Satoshi Imai,et al.  Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.

[21]  Heiga Zen,et al.  An overview of nitech HMM-based speech synthesis system for blizzard challenge 2005 , 2005, INTERSPEECH.

[22]  A. M. Kondoz,et al.  CELP base-band coder for high quality speech coding at 9.6 to 2.4 kbps , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[23]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  吉村 貴克,et al.  Simultaneous modeling of phonetic and prosodic parameters,and characteristic conversion for HMM-based text-to-speech systems , 2002 .

[25]  France Mihelic,et al.  Evaluation of the Slovenian HMM-Based Speech Synthesis System , 2004, TSD.

[26]  益子 貴史,et al.  HMM-based speech synthesis and its applications , 2003 .

[27]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[28]  Heiga Zen,et al.  Towards the development of a brazilian portuguese text-to-speech system based on HMM , 2003, INTERSPEECH.

[29]  Keiichi Tokuda,et al.  Mixed excitation for HMM-based speech synthesis , 2001, INTERSPEECH.

[30]  Keiichi Tokuda,et al.  An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features , 1995, EUROSPEECH.

[31]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[32]  K. Koishida,et al.  Vector quantization of speech spectral parameters using statistics of dynamic features , 1997 .

[33]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[34]  Bishnu S. Atal,et al.  Amplitude optimization and pitch prediction in multipulse coders , 1989, IEEE Trans. Acoust. Speech Signal Process..

[35]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[36]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[37]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[38]  Soo Ngee Koh,et al.  Mixed excitation linear prediction coding of wideband speech at 8 kbps , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[39]  Abeer Alwan,et al.  Text to Speech Synthesis: New Paradigms and Advances , 2004 .

[40]  H. Strube Linear prediction on a warped frequency scale , 1980 .

[41]  Keiichi Tokuda,et al.  Generalized cepstral analysis of speech - unified approach to LPC and cepstral method , 1990, ICSLP.