Line Spectral Pairs Based Voice Conversion using Radial Basis Function

Voice Conversion (VC) is a technique which morphs the speaker dependent acoustical cues of the source speaker to those of the target speaker. Speaker dependent acoustical cues are characterized at different levels such as shape of vocal tract and glottal excitation. In this paper, vocal tract parameters and glottal excitations are characterized using Line Spectral Pairs (LSP) and pitch residual respectively. Strong generalization ability of Radial Basis Function (RBF) is utilized to map the acoustical cues namely, LSP and pitch residual of source speaker to that of target speaker. The subjective and objective measures are used to evaluate the comparative performance of RBF and state of the art GMM based voice conversion system. Objective measures and simulation results indicate that the RBF transformation model performed better than GMM model. Subjective evaluations illustrate that the proposed algorithm maintains target voice individuality, naturalness and quality of the speech signal.

[1]  Satoshi Nakamura,et al.  Speaker adaptation and voice conversion by codebook mapping , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[2]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[3]  B. Atal,et al.  Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.

[4]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[5]  Ian McLoughlin,et al.  Applied Speech and Audio Processing: With Matlab Examples , 2009 .

[6]  Bayya Yegnanarayana,et al.  Transformation of formants for voice conversion using artificial neural networks , 1995, Speech Commun..

[7]  M. Abe A segment-based approach to voice conversion , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Alexander Kain,et al.  Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Oytun Türk,et al.  CROSS-LINGUAL VOICE CONVERSION , 2007 .

[10]  Tomoki Toda,et al.  Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation , 2006, INTERSPEECH.

[11]  K. Sreenivasa Rao,et al.  Voice conversion by mapping the speaker-specific features using pitch synchronous approach , 2010, Comput. Speech Lang..

[12]  Kishore Prahallad,et al.  Voice conversion using Artificial Neural Networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Eric Moulines,et al.  Statistical methods for voice quality transformation , 1995, EUROSPEECH.

[14]  Fausto Pellandini,et al.  Efficient algorithm to compute LSP parameters from 10th-order LPC coefficients , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Stephen J. Roberts,et al.  Wavelet-based voice morphing , 2004 .

[16]  K. Shikano,et al.  Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[17]  Hui Ye,et al.  Quality-enhanced voice morphing using maximum likelihood transformations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Rabul Hussain Laskar,et al.  Comparing ANN and GMM in a voice conversion framework , 2012, Appl. Soft Comput..

[19]  Manfred R. Schroeder,et al.  Vocoders: Analysis and synthesis of speech , 1966 .

[20]  Yannis Stylianou,et al.  A system for voice conversion based on probabilistic classification and a harmonic plus noise model , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[21]  Yannis Stylianou,et al.  Voice Transformation: A survey , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Pawan Kumar,et al.  Gender classification using pitch and formants , 2011, ICCCS '11.

[23]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[24]  Levent M. Arslan,et al.  Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum , 1997, EUROSPEECH.

[25]  Jia Liu,et al.  An improved method for voice conversion based on Gaussian mixture model , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[26]  Ian Vince McLoughlin,et al.  Line spectral pairs , 2008, Signal Process..

[27]  Levent M. Arslan,et al.  Speaker Transformation Algorithm using Segmental Codebooks (STASC) , 1999, Speech Commun..

[28]  Chen Zhi,et al.  Voice conversion based on Genetic Algorithms , 2010, 2010 IEEE 12th International Conference on Communication Technology.

[29]  Tom Tremain Analysis and synthesis of speech , 1995 .

[30]  Mukesh A. Zaveri,et al.  Voice Transformation Using Radial Basis Function , 2013 .

[31]  Masanobu Abe,et al.  Voice conversion based on piecewise linear conversion rules of formant frequency and spectrum tilt , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Daniel Erro Eslava Intra-lingual and cross-lingual voice conversion using harmonic plus stochastic models , 2008 .

[33]  B. Yegnanarayana,et al.  Voice conversion: Factors responsible for quality , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Daniel Erro,et al.  Voice Conversion Based on Weighted Frequency Warping , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Zhiwei Shuang,et al.  Frequency warping based on mapping formant parameters , 2006, INTERSPEECH.