Analysis of HMM-Based Lombard Speech Synthesis

Humans modify their voice in interfering noise in order to maintain the intelligibility of their speech – this is called the Lombard effect. This ability, however, has not been extensively modeled in speech synthesis. Here we compare several methods of synthesizing speech in noise using a physiologically based statistical speech synthesis system (GlottHMM). The results show that in a realistic street noise situation the synthetic Lombard speech is judged by listeners both as appropriate for the situation and as intelligible as natural Lombard speech. Of the different types of models, one using adaptation and extrapolation performed the best.

[1]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Paavo Alku,et al.  The GlottHMM Speech Synthesis Entry for Blizzard Challenge 2010 , 2010 .

[3]  S. King,et al.  The Blizzard Challenge 2010 , 2010 .

[4]  Milos Cernak Unit Selection Speech Synthesis in Noise , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Rupal Patel,et al.  Loudmouth:: modifying text-to-speech synthesis in noise , 2006, Assets '06.

[6]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[7]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[8]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[9]  Paavo Alku,et al.  Comparison of formant enhancement methods for HMM-based speech synthesis , 2010, SSW.

[10]  Alan W. Black,et al.  Improving the understandability of speech synthesis by modeling speech in noise , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Ville Pulkki,et al.  Spatial Sound Reproduction with Directional Audio Coding , 2007 .

[12]  Takao Kobayashi,et al.  Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.