论文信息 - Eigenvoice-Based Approach to Voice Conversion and Voice Quality Control

Eigenvoice-Based Approach to Voice Conversion and Voice Quality Control

This paper reviews our proposed approach to voice conversion (VC) and voice quality control based on an eigenvoice technique. VC is a technique to modify nonlinguistic information such as speaker individuality while keeping linguistic information unchanged. In the traditional VC framework, a conversion model for a source and target speaker-pair needs to be trained in advance using a parallel data set consisting of utterance-pairs of these two speakers. To make VC technologies more practical, we have developed a new VC paradigm for flexibly building the conversion model for an arbitrary speaker-pair by effectively using speech samples of many other speakers. In this paper, we give an overview of eigenvoice conversion (EVC) as one of our proposed VC techniques.

Tomoki Toda | T. Toda

[1] Richard M. Schwartz,et al. A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2] Tomoki Toda,et al. An improved one-to-many eigenvoice conversion system , 2008, INTERSPEECH.

[3] Keiichi Tokuda,et al. Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model , 2008, Speech Commun..

[4] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[5] K. Shichiri. Eigenvoice for HMM-based speech synthesis , 2002 .

[6] Kiyohiro Shikano,et al. Non-Audible Murmur (NAM) Recognition , 2006, IEICE Trans. Inf. Syst..

[7] Jia Liu,et al. Voice conversion with smoothed GMM and MAP adaptation , 2003, INTERSPEECH.

[8] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9] Vassilios Diakoloukas,et al. Maximum-likelihood stochastic-transformation adaptation of hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[10] Hazarathaiah Malepati,et al. Speech and Audio Processing , 2010 .

[11] Hermann Ney,et al. Text-Independent Voice Conversion Based on Unit Selection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12] Roland Kuhn,et al. Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[13] K. Shikano,et al. Statistical analysis of bilingual speaker's speech for cross-language voice conversion. , 1991, The Journal of the Acoustical Society of America.

[14] Chung-Hsien Wu,et al. Map-based adaptation for speech conversion using adaptation data selection and non-parallel training , 2006, INTERSPEECH.

[15] Yoshinori Sagisaka,et al. Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks , 1995, Speech Commun..

[16] Douglas D. O'Shaughnessy,et al. Statistical recovery of wideband speech from narrowband speech , 1992, IEEE Trans. Speech Audio Process..

[17] Tomoki Toda,et al. An evaluation of many-to-one voice conversion algorithms with pre-stored speaker data sets , 2007, SSW.

[18] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[19] Tomoki Toda,et al. Eigenvoice conversion based on Gaussian mixture model , 2006, INTERSPEECH.

[20] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[21] Zicheng Liu,et al. Multisensory processing for speech enhancement and magnitude-normalized spectra for speech modeling , 2008, Speech Commun..

[22] John-Paul Hosom,et al. Improving the intelligibility of dysarthric speech , 2007, Speech Commun..

[23] Tomoki Toda,et al. Maximum a posteriori adaptation for many-to-one eigenvoice conversion , 2008, INTERSPEECH.

[24] Alex Acero,et al. Robust bandwidth extension of noise-corrupted narrowband speech , 2005, INTERSPEECH.

[25] Takashi Nose,et al. A Style Control Technique for HMM-Based Expressive Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[26] Tomoki Toda,et al. Regression approaches to voice quality controll based on one-to-many eigenvoice conversion , 2007, SSW.

[27] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[28] Tomoki Toda,et al. Speaker adaptive training for one-to-many eigenvoice conversion based on Gaussian mixture model , 2007, INTERSPEECH.

[29] Tomoki Toda,et al. Voice conversion for various types of body transmitted speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30] Tomoki Toda,et al. One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[31] Yoshinori Sagisaka,et al. Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[32] Hui Ye,et al. Quality-enhanced voice morphing using maximum likelihood transformations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[33] Satoshi Nakamura,et al. Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[34] Tomoki Toda,et al. Evaluation of speaking-aid system with voice conversion for laryngectomees toward its use in practical environments , 2008, INTERSPEECH.

[35] Athanasios Mouchtaris,et al. Nonparallel training for voice conversion based on a parameter adaptation approach , 2006, IEEE Transactions on Audio, Speech, and Language Processing.