An Automatic Segmentation and Mapping Approach for Voice Conversion Parameter Training

In many applications of voice conversion (VC), we do not possess corresponding training utterances of source and target speaker. In this paper, an automatic phonetic class segmentation and mapping approach based on dynamic frequency warping is presented. After locating corresponding classes of source and target speaker, we are able to apply conventional parameter training methods for VC. As an example, we utilize this approach to estimate the parameters of warping functions for vocal tract length normalization which serves as a simple VC technique.

[1]  Oytun Türk,et al.  NEW METHODS FOR VOICE CONVERSION , 2003 .

[2]  Tomoki Toda,et al.  Evaluation of cross-language voice conversion based on GMM and straight , 2001, INTERSPEECH.

[3]  Philip C. Woodland,et al.  An investigation into vocal tract length normalisation , 1999, EUROSPEECH.

[4]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[6]  William J. Byrne,et al.  Speaker normalization with all-pass transforms , 1998, ICSLP.

[7]  S. Wegmann,et al.  Speaker normalization on conversational telephone speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Hiroaki Kitano,et al.  Speech--to--Speech Translation , 1993 .

[9]  Min Tang,et al.  Voice transformations: from speech synthesis to mammalian vocalizations , 2001, INTERSPEECH.

[10]  Richard M. Stern,et al.  Robust speech recognition by normalization of the acoustic space , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[12]  Dae Hee Youn,et al.  A new voice transformation method based on both linear and nonlinear prediction analysis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[14]  Peter Norvig,et al.  Verbmobih A Translation System for Face-to-Face Dialog , 1994 .

[15]  Hiroshi Matsumoto,et al.  Vowel normalization by frequency warped spectral matching , 1986, Speech Commun..

[16]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[17]  Keiichi Tokuda,et al.  Speaker adaptation for HMM-based speech synthesis system using MLLR , 1998, SSW.

[18]  Jordan Cohen,et al.  Vocal tract normalization in speech recognition: Compensating for systematic speaker variability , 1995 .