Alleviating the one-to-many mapping problem in voice conversion with context-dependent modeling

This paper addresses the "one-to-many" mapping problem in Voice Conversion (VC) by exploring source-to-target mappings in GMM-based spectral transformation. Specifically, we examine differences using source-o nly versus joint source/target information in the classificati on stage of transformation, effectively illustrating a "one-to- many effect" in the traditional acoustically-based GMM. We propose combating this effect by using phonetic information in the GMM learning and classification. We then show the success of our proposed context-dependent modeling with transformation results using an objective error cri terion. Finally, we discuss implications of our work in ada pting current approaches to VC. Index Terms : Voice conversion, GMM, Spectral mapping.

[1]  M. Sambur,et al.  Selection of acoustic features for speaker identification , 1975 .

[2]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Bayya Yegnanarayana,et al.  Transformation of formants for voice conversion using artificial neural networks , 1995, Speech Commun..

[4]  Yannis Stylianou,et al.  On the transformation of the speech spectrum for voice conversion , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[6]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Taoufik En-Najjary Conversion de voix pour la synthèse de la parole , 2005 .

[8]  Athanasios Mouchtaris,et al.  Conditional Vector Quantization for Voice Conversion , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Takao Kobayashi,et al.  Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.