Frame alignment method for cross-lingual voice conversion

Most of the existing voice conversion methods calculate the optimal transformation function from a given set of paired acoustic vectors of the source and target speakers. The alignment of the phonetically equivalent source and target frames is problematic when the training corpus available is not parallel, although this is the most realistic situation. The alignment task is even more difficult in cross-lingual applications because the phoneme sets may be different in the involved languages. In this paper, a new iterative alignment method based on acoustic distances is proposed. The method is shown to be suitable for text-independent and cross-lingual voice conversion, and the conversion scores obtained in our evaluation experiments are not far from the performance achieved by using parallel training corpora.

[1]  Hui Ye,et al.  Voice conversion for unknown speakers , 2004, INTERSPEECH.

[2]  Daniel Erro,et al.  Weighted frequency warping for voice conversion , 2007, INTERSPEECH.

[3]  Keiichi Tokuda,et al.  Speaker adaptation for HMM-based speech synthesis system using MLLR , 1998, SSW.

[4]  Levent M. Arslan,et al.  Speaker Transformation Algorithm using Segmental Codebooks (STASC) , 1999, Speech Commun..

[5]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hermann Ney,et al.  A first step towards text-independent voice conversion , 2004, INTERSPEECH.

[7]  H. Ney,et al.  VTLN-based voice conversion , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[8]  Keiichi Tokuda,et al.  Voice characteristics conversion for HMM-based speech synthesis system , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Chung-Hsien Wu,et al.  Map-based adaptation for speech conversion using adaptation data selection and non-parallel training , 2006, INTERSPEECH.

[10]  Hermann Ney,et al.  Text-independent cross-language voice conversion , 2006, INTERSPEECH.

[11]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[12]  Athanasios Mouchtaris,et al.  Nonparallel training for voice conversion based on a parameter adaptation approach , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Taoufik En-Najjary Conversion de voix pour la synthèse de la parole , 2005 .

[14]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[15]  Daniel Erro,et al.  Voice Conversion of Non-aligned Data using Unit Selection , 2006 .

[16]  Hui Ye,et al.  Quality-enhanced voice morphing using maximum likelihood transformations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.