Observation-model error compensation for enhanced spectral envelope transformation in voice conversion

A strategy to enhance the signal quality and naturalness was designed for performing probabilistic spectral envelope transformation in voice conversion. The existing modeling error of the probabilistic mixture to represent the observed envelope features is translated generally as an averaging of the information in the spectral domain, resulting in over-smoothed spectra. Moreover, a transformation based on poorly modeled features might not be considered reliable. Our strategy consists of a novel definition of the spectral transformation to compensate the effect of both over-smoothing and poor modeling. The results of an experimental evaluation show that the perceived naturalness of converted speech was enhanced.

[1]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[2]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[3]  Olivier Rosec,et al.  Voice Conversion Using Dynamic Frequency Warping With Amplitude Scaling, for Parallel or Nonparallel Corpora , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  Keiichi Tokuda,et al.  Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Yu Tsao,et al.  Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training , 2013, INTERSPEECH.

[7]  Fernando Villavicencio,et al.  GMM-PCA based speaker-timbre conversion on full-quality speech , 2010, SSW.

[8]  Axel Röbel,et al.  Applying improved spectral modeling for High Quality voice conversion , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Tomoki Toda,et al.  Statistical singing voice conversion with direct waveform modification based on the spectrum differential , 2014, INTERSPEECH.

[10]  Haizhou Li,et al.  Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  P. Alku,et al.  On line spectral frequencies , 2003, IEEE Signal Processing Letters.

[12]  Amro El-Jaroudi,et al.  Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..

[13]  Daniel Erro,et al.  Voice Conversion Based on Weighted Frequency Warping , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  X. Rodet EFFICIENT SPECTRAL ENVELOPE ESTIMATION AND ITS APPLICATION TO PITCH SHIFTING AND ENVELOPE PRESERVATION , 2005 .

[15]  Tomoki Toda,et al.  A postfilter to modify the modulation spectrum in HMM-based speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Tuomo Raitio,et al.  DNN-based stochastic postfilter for HMM-based speech synthesis , 2014, INTERSPEECH.

[17]  Jordi Bonada WIDE-BAND HARMONIC SINUSOIDAL MODELING , 2008 .

[18]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[19]  Jordi Bonada,et al.  Applying voice conversion to concatenative singing-voice synthesis , 2010, INTERSPEECH.