Dynamic model selection for spectral voice conversion

Statistical methods for voice conversion are usually based on a single model selected in order to represent a tradeoff between goodness of fit and complexity. In this paper we assume that the best model may change over time, depending on the source acoustic features. We present a new method for spectral voice conversion called Dynamic Model Selection (DMS), in which a set of potential best models with increasing complexity including a mixture of Gaussian and probabilistic principal component analyzers are considered during the conversion of a source speech signal into a target speech signal. This set is built during the learning phase, according to the Bayes information criterion (BIC). During the conversion, the best model is dynamically selected among the models in the set, according to the acoustical features of each source frame. Subjective tests show that the method improves the conversion in terms of proximity to the target and quality.

[1]  Bayya Yegnanarayana,et al.  Transformation of formants for voice conversion using artificial neural networks , 1995, Speech Commun..

[2]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Xianggui Qu,et al.  Multivariate Data Analysis , 2007, Technometrics.

[4]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[5]  Michael I. Jordan,et al.  Mixtures of Probabilistic Principal Component Analyzers , 2001 .

[6]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  METHODS FOR SUBJECTIVE DETERMINATION OF TRANSMISSION QUALITY Summary , 2022 .

[8]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[9]  Yannis Stylianou,et al.  On the transformation of the speech spectrum for voice conversion , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  A.B. Martinez,et al.  Probabilistic principal component analysis applied to voice conversion , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[12]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[13]  Axel Röbel,et al.  Extending efficient spectral envelope modeling to Mel-frequency based representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[15]  Philippe Depalle,et al.  SVP: A Modular System for Analysis, Processing and Synthesis of Sound Signals , 1991, ICMC.