Complex Cepstrum Based Voice Conversion Using Radial Basis Function

The complex cepstrum vocoder is used to modify the speaker specific characteristics of the source speaker speech to that of the target speaker speech. The low time and high time liftering are used to split the calculated cepstrum into the vocal tract and the source excitation parameters. The obtained mixed phase vocal tract and source excitation parameters with finite impulse response preserve the phase properties of the resynthesized speech frame. The radial basis function is explored to capture the nonlinear mapping function for modifying the complex cepstrum based real and imaginary components of the vocal tract and source excitation of the speech signal. The state-of-the-art Mel cepstrum envelope and the fundamental frequency () are considered to represent the vocal tract and the source excitation of the speech frame, respectively. Radial basis function is used to capture and formulate the nonlinear relations between the Mel cepstrum envelope of the source and target speakers. Mean and standard deviation approach is employed to modify the fundamental frequency (). The Mel log spectral approximation filter is used to reconstruct the speech signal from the modified Mel cepstrum envelope and fundamental frequency. A comparison of the proposed complex cepstrum based model has been made with the state-of-the-art Mel Cepstrum Envelope based voice conversion model with objective and subjective evaluations. The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepstrum envelope based voice conversion.

[1]  K.-S. Lee,et al.  Statistical Approach for Voice Personality Transformation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Rabab Kreidieh Ward,et al.  A New Method for Obtaining Accurate Estimates of Vocal-Tract Filters and Glottal Waves From Vowel Sounds , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[4]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[5]  Werner Verhelst,et al.  A new model for the short-time complex cepstrum of voiced speech , 1986, IEEE Trans. Acoust. Speech Signal Process..

[6]  Mark J. F. Gales,et al.  Complex cepstrum as phase information in statistical parametric speech synthesis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Martin Vondra,et al.  Speech Modeling Using the Complex Cepstrum , 2010, COST 2102 Training School.

[8]  Thierry Dutoit,et al.  Complex cepstrum-based decomposition of speech for glottal source estimation , 2009, INTERSPEECH.

[9]  Satoshi Imai,et al.  Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.

[10]  Tomoki Toda,et al.  High quality voice conversion based on Gaussian mixture model with dynamic frequency warping , 2001, INTERSPEECH.

[11]  Shashidhar G. Koolagudi,et al.  Voice Transformation by Mapping the Features at Syllable Level , 2007, PReMI.

[12]  Kishore Prahallad,et al.  Spectral Mapping Using Artificial Neural Networks for Voice Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  David Sündermann Voice Conversion : State-ofthe-Art and Future Work , 2009 .

[14]  Alexander Kain,et al.  Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Satoshi Nakamura,et al.  Speaker adaptation and voice conversion by codebook mapping , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[16]  A. Oppenheim Speech analysis-synthesis system based on homomorphic filtering. , 1969, The Journal of the Acoustical Society of America.

[17]  Levent M. Arslan,et al.  Speaker Transformation Algorithm using Segmental Codebooks (STASC) , 1999, Speech Commun..

[18]  Hui Ye,et al.  High quality voice morphing , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Fumitada Itakura,et al.  Text-dependent speaker recognition using the information in the higher frequency band , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Werner Verhelst,et al.  Voice conversion using partitions of spectral feature space , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[21]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[22]  K. Sreenivasa Rao,et al.  Voice conversion by mapping the speaker-specific features using pitch synchronous approach , 2010, Comput. Speech Lang..

[23]  T. Quatieri,et al.  Phase modelling and its application to sinusoidal transform coding , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[25]  Thierry Dutoit,et al.  Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation , 2011, Speech Commun..

[26]  Mukesh Zaveri,et al.  Cepstrum liftering based voice conversion using RBF and GMM , 2013, 2013 International Conference on Communication and Signal Processing.

[27]  B. Yegnanarayana,et al.  Voice conversion: Factors responsible for quality , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Yoshinori Sagisaka,et al.  Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks , 1995, Speech Commun..

[29]  Keiichi Tokuda,et al.  Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.

[30]  Moncef Gabbouj,et al.  Voice Conversion Using Partial Least Squares Regression , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Jr. T. Quatieri Minimum and mixed phase speech analysis-synthesis by adaptive homomorphic deconvolution , 1979 .

[32]  Hermann Ney,et al.  A study on residual prediction techniques for voice conversion , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[33]  Stephen J. Roberts,et al.  Wavelet-based voice morphing , 2004 .

[34]  Rabul Hussain Laskar,et al.  A pitch synchronous approach to design voice conversion system using source-filter correlation , 2012, Int. J. Speech Technol..