Weighted frequency warping for voice conversion

This paper presents a new voice conversion method called Weighted Frequency Warping (WFW), which combines the well known GMM approach and the frequency warping approach. The harmonic plus stochastic model has been used to analyze, modify and synthesize the speech signal. Special phase manipulation procedures have been designed to allow the system to work in pitch-asynchronous mode. The experiments show that the proposed technique reaches a high degree of similarity between the converted and target speakers, and the naturalness and quality of the resynthesized speech is much higher than those of classical GMM-based systems.

[1]  Hermann Ney,et al.  A study on residual prediction techniques for voice conversion , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  H. Ney,et al.  VTLN-based voice conversion , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[3]  Jia Liu,et al.  Voice conversion with smoothed GMM and MAP adaptation , 2003, INTERSPEECH.

[4]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[5]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[6]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[7]  Daniel Erro,et al.  A Pitch-Asynchronous Simple Method for Speech Synthesis by Diphone Concatenation using the Deterministic plus Stochastic Model , 2005 .

[8]  Antonio Bonafonte,et al.  Including dynamic and phonetic information in voice conversion systems , 2004, INTERSPEECH.

[9]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[10]  K. Shikano,et al.  Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Hui Ye,et al.  Quality-enhanced voice morphing using maximum likelihood transformations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Levent M. Arslan,et al.  Speaker Transformation Algorithm using Segmental Codebooks (STASC) , 1999, Speech Commun..

[13]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[14]  Amro El-Jaroudi,et al.  Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..