论文信息 - One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices

One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices

This paper describes two flexible frameworks of voice conversion (VC), i.e., one-to-many VC and many-to-one VC. One-to-many VC realizes the conversion from a user's voice as a source to arbitrary target speakers' ones and many-to-one VC realizes the conversion vice versa. We apply eigenvoice conversion (EVC) to both VC frameworks. Using multiple parallel data sets consisting of utterance-pairs of the user and multiple pre-stored speakers, an eigenvoice Gaussian mixture model (EV-GMM) is trained in advance. Unsupervised adaptation of the EV-GMM is available to construct the conversion model for arbitrary target speakers in one-to-many VC or arbitrary source speakers in many-to-one VC using only a small amount of their speech data. Results of various experimental evaluations demonstrate the effectiveness of the proposed VC frameworks.

Tomoki Toda | Yamato Ohtani | Kiyohiro Shikano

[1] Keiichi Tokuda,et al. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[3] Yoshinori Sagisaka,et al. Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks , 1995, Speech Commun..

[4] Keiichi Tokuda,et al. Eigenvoices for HMM-based speech synthesis , 2002, INTERSPEECH.

[5] Roland Kuhn,et al. Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[6] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.

[7] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8] Athanasios Mouchtaris,et al. Non-parallel training for voice conversion by maximum likelihood constrained adaptation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[10] C. Zheng,et al. ; 0 ; , 1951 .

[11] Tomoki Toda,et al. Eigenvoice conversion based on Gaussian mixture model , 2006, INTERSPEECH.

[12] Yoshinori Sagisaka,et al. Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..