Performance of new voice conversion systems based on GMM models and applied to Arabic language

Voice conversion (VC) consists in modifying the source speaker’s voice toward the voice of the target speaker. In our paper, we are interested in calculating the performance of a conversion system based on GMM, applied to the Arabic language, by exploiting both the information of the pitch dynamics and the spectrum. We study three approaches to obtain the global conversion function of the pitch and the overall spectrum, using the joint probability model. In the first approach, we calculate the joint conversion of pitch and spectrum. In the second approach, the pitch is calculated by linear conversion. In the third approach, we use the relationship between the pitch and the spectrum. For the conversion of noise we use a new technique that consists in modeling the noise of the voiced or unvoiced frames by GMMs. We use the HNM for analysis/synthesis and a regularized discrete cepstrum in order to estimate the spectrum of the speech signal.

[1]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[2]  Yoshihiko Nankaku,et al.  Voice conversion based on simultaneous modelling of spectrum and F0 , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Kishore Prahallad,et al.  Voice conversion using Artificial Neural Networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[5]  O. Cappé,et al.  Regularized estimation of cepstrum envelope from discrete frequency points , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[6]  Wenju Liu,et al.  Voice conversion based on joint pitch and spectral transformation with component group-GMM , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[7]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[8]  Yongxing Jia,et al.  Voice Conversion Using HMM combined with GMM , 2008, 2008 Congress on Image and Signal Processing.

[9]  Qiang Huo,et al.  An Environment-Compensated Minimum Classification Error Training Approach Based on Stochastic Vector Mapping , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Bayya Yegnanarayana,et al.  Transformation of formants for voice conversion using artificial neural networks , 1995, Speech Commun..

[11]  Yannis Stylianou,et al.  Stochastic modeling of spectral adjustment for high quality pitch modification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  K. Shikano,et al.  Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[14]  H. Ney,et al.  VTLN-based cross-language voice conversion , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[15]  Taoufik En-Najjary,et al.  A voice conversion method based on joint pitch and spectral envelope transformation , 2004, INTERSPEECH.

[16]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  T. Dutoit,et al.  Traitement de la Parole , 2000 .

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Tomoki Toda,et al.  Evaluation of cross-language voice conversion based on GMM and straight , 2001, INTERSPEECH.

[20]  Xia Wang,et al.  Text-independent voice conversion based on state mapped codebook , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  William H. Press,et al.  Numerical recipes in C , 2002 .

[22]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[23]  Zhang Bing,et al.  Voice conversion based on improved GMM and spectrum with synchronous prosody , 2008, 2008 9th International Conference on Signal Processing.

[24]  Chung-Hsien Wu,et al.  Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Amrane Houacine,et al.  Performance of Voice Conversion Systems Based on GMM and Applied to Arabic Language , 2009, 2009 Fifth International Conference on MEMS NANO, and Smart Systems.

[26]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .