Comparison of formant enhancement methods for HMM-based speech synthesis

Hidden Markov model (HMM) based speech synthesis has a tendency to over-smooth the spectral envelope of speech, which makes the speech sound muffled. One means to compensate for the over-smoothing is to enhance the formants of the spectral model. This paper compares the performance of different formant enhancement methods, and studies the enhancement of the formants prior to HMM training in order to preemptively compensate for the over-smoothing. A new method for enhancing the formants of an all-pole model is also introduced. Experiments indicate that the formant enhancement prior to HMM training improves the quality of synthetic speech by providing sharper formants, and the performance of the new formant enhancement method is similar to the existing method.

[1]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Heiga Zen,et al.  Hidden semi-Markov model based speech synthesis , 2004, INTERSPEECH.

[3]  Heiga Zen,et al.  Trajectory modeling based on HMMs with the explicit relationship between static and dynamic features , 2003, INTERSPEECH.

[4]  Ren-Hua Wang,et al.  USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method , 2006, Blizzard Challenge.

[5]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[6]  B. Moore,et al.  A revision of Zwicker's loudness model , 1996 .

[7]  Keiichi Tokuda,et al.  An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features , 1995, EUROSPEECH.

[8]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[9]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[10]  智基 戸田,et al.  Recent developments of the HMM-based speech synthesis system (HTS) , 2007 .

[11]  E A Newman Speech Analysis, Synthesis and Perception , 1966 .

[12]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[13]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[14]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Ren-Hua Wang,et al.  Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  Allen Gersho,et al.  Adaptive postfiltering for quality enhancement of coded speech , 1995, IEEE Trans. Speech Audio Process..

[17]  Biing-Hwang Juang,et al.  Line spectrum pair (LSP) and speech data compression , 1984, ICASSP.

[18]  P. Alku,et al.  A method for generating natural-sounding speech stimuli for cognitive brain research , 1999, Clinical Neurophysiology.

[19]  Yu Hongzhi,et al.  Research on HMM_based speech synthesis for Lhasa dialect , 2011, 2011 International Conference on Image Analysis and Signal Processing.

[20]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[21]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[22]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[23]  Paavo Alku,et al.  HMM-based Finnish text-to-speech system utilizing glottal inverse filtering , 2008, INTERSPEECH.

[24]  H. Zen,et al.  An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..