On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speech

In this paper, we study the role of a recently proposed feature enhancement technique in building HMM-based synthetic voices using reverberant speech data. The feature enhancement technique studied combines the advantages of missing data imputation and non-negative matrix factorization (NMF) based methods in cleaning up the reverberant features. Speaker adaptation of a clean average voice using noisy data is generally better than building a speaker dependent voice using the noisy data. In this paper, we show that the proposed feature enhancement technique can further improve the spectral match between the enhanced feature adapted voice and a clean speaker dependent voice.

[1]  Takao Kobayashi,et al.  Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Guy J. Brown,et al.  Recognition of Reverberant Speech using Full Cepstral Features and Spectral Missing Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[4]  Guy J. Brown,et al.  Techniques for handling convolutional distortion with 'missing data' automatic speech recognition , 2004, Speech Commun..

[5]  John Kane,et al.  HMM-based synthesis of creaky voice , 2013, INTERSPEECH.

[6]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[7]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Paavo Alku,et al.  Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise , 2014, Comput. Speech Lang..

[10]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[11]  Simon King,et al.  Robustness of HMM-based speech synthesis , 2008, INTERSPEECH.

[12]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[13]  Mikko Kurimo,et al.  Noise in HMM-Based Speech Synthesis Adaptation: Analysis, Evaluation Methods and Experiments , 2014, IEEE Journal of Selected Topics in Signal Processing.

[14]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[16]  Ulpu Remes Bounded conditional mean imputation with an approximate posterior , 2013, INTERSPEECH.

[17]  Paavo Alku,et al.  Towards Glottal Source Controllability in Expressive Speech Synthesis , 2012, INTERSPEECH.

[18]  Guy J. Brown,et al.  Recognition of Reverberant Speech by Missing Data Imputation and NMF Feature Enhancement , 2014 .

[19]  Heiga Zen,et al.  Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.