Objective evaluation of the Dynamic Model Selection method for spectral voice conversion

Spectral voice conversion is usually performed using a single model selected in order to represent a tradeoff between goodness of fit and complexity. Recently, we proposed a new method for spectral voice conversion, called Dynamic Model Selection (DMS), in which we assumed that the model topology may change over time, depending on the source acoustic features. In this method a set of models with increasing complexity is considered during the conversion of a source speech signal into a target speech signal. During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each source frame. In this paper, we present an objective evaluation demonstrating that this new method improves the conversion by reducing the transformation error compared to methods based on an single model.

[1]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[2]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[3]  Axel Röbel,et al.  Extending efficient spectral envelope modeling to Mel-frequency based representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[5]  Xavier Rodet,et al.  Dynamic model selection for spectral voice conversion , 2010, INTERSPEECH.

[6]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Yannis Stylianou,et al.  On the transformation of the speech spectrum for voice conversion , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[8]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Axel Röbel,et al.  Applying improved spectral modeling for High Quality voice conversion , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[11]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[12]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Xavier Rodet,et al.  Automatic Phoneme Segmentation with Relaxed Textual Constraints , 2008, LREC.

[14]  K. Shikano,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[15]  Philippe Depalle,et al.  SVP: A Modular System for Analysis, Processing and Synthesis of Sound Signals , 1991, ICMC.

[16]  Bayya Yegnanarayana,et al.  Transformation of formants for voice conversion using artificial neural networks , 1995, Speech Commun..