Voice Conversion based on GMM and Artificial Neural Network

Voice Conversion (VC) technique allows to transform the voice of the source speaker so that it is perceived as uttered by the target speaker. In this paper, a novel VC method combining Gaussian Mixture Model (GMM) and Artificial Neural Network is proposed. To overcome the over-smoothing problem of GMM-based mapping method, we propose to convert the basic spectral envelope by GMM method and the residual envelope by ANN method. Compared with the traditional GMM based method, the proposed method can effectively improve the quality and naturalness of the converted speech. Experimental results using both objective tests and listening tests show the superiority of the new method.

[1]  Wei Zhang,et al.  A Hybrid GMM and Codebook Mapping Method for Spectral Conversion , 2005, ACII.

[2]  Eric Moulines,et al.  HNS: Speech modification based on a harmonic+noise model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Jia Liu,et al.  Voice conversion with smoothed GMM and MAP adaptation , 2003, INTERSPEECH.

[4]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[5]  Kishore Prahallad,et al.  Spectral Mapping Using Artificial Neural Networks for Voice Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Eric Moulines,et al.  High-quality speech modification based on a harmonic + noise model , 1995, EUROSPEECH.

[7]  Daniel Erro,et al.  A Pitch-Asynchronous Simple Method for Speech Synthesis by Diphone Concatenation using the Deterministic plus Stochastic Model , 2005 .

[8]  Costas S. Xydeas,et al.  Split matrix quantization of LPC parameters , 1999, IEEE Trans. Speech Audio Process..

[9]  Yannis Stylianou,et al.  Voice Transformation: A survey , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[12]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[13]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[14]  K. Shikano,et al.  Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Keiichi Tokuda,et al.  Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .