GMM-PCA based speaker-timbre conversion on full-quality speech

This work addresses a study of the GMM-based approach to achieve full-quality speaker timbre conversion. In general, high-quality voice conversion requires accurate spectral envelope estimates, resulting in high-dimensional feature vectors and relatively high computational. Aiming to achieve lowdimensional processing, accurate envelope estimates of the speakers are mel-frequency scaled and projected onto the space defined by a subset of the principal components. The GMMbased features conversion is then performed in the reduced space. Our experimental findings confirm that this strategy provides benefits, especially observed on the resulting converted speech quality, with a significant computational cost reduction.

[1]  Axel Röbel,et al.  Applying improved spectral modeling for High Quality voice conversion , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Jordi Bonada WIDE-BAND HARMONIC SINUSOIDAL MODELING , 2008 .

[3]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[4]  Axel Röbel,et al.  Extending efficient spectral envelope modeling to Mel-frequency based representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Hynek Hermansky,et al.  Spectral envelope sampling and interpolation in linear predictive analysis of speech , 1984, ICASSP.

[6]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Unto K. Laine,et al.  A comparison of warped and conventional linear predictive coding , 2001, IEEE Trans. Speech Audio Process..