Voice Conversion Using GMM with Enhanced Global Variance

The goal of voice conversion is to transform a sentence said by one speaker, to sound as if another speaker had said it. The classical conversion based on a Gaussian Mixture Model and several other schemes suggested since, produce muffled sounding outputs, due to excessive smoothing of the spectral envelopes. To reduce the muffling effect, enhancement of the Global Variance (GV) of the spectral features was recently suggested. We propose a different approach for GV enhancement, based on the classical conversion formalized as a GV-constrained minimization. Listening tests show that an improvement in quality is achieved by the proposed approach.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[4]  O. Cappé,et al.  Regularization techniques for discrete cepstrum estimation , 1996, IEEE Signal Processing Letters.

[5]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[6]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  K. Shikano,et al.  Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Alexander Kain,et al.  Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[10]  Taoufik En-Najjary,et al.  A voice conversion method based on joint pitch and spectral envelope transformation , 2004, INTERSPEECH.

[11]  Bohn Stafleu van Loghum,et al.  Online … , 2002, LOG IN.

[12]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Daniel Erro,et al.  Voice Conversion Based on Weighted Frequency Warping , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.