Voice conversion system using SVM for vocal tract modification and codebook based model for pitch contour modification

The basic idea of this paper is to design an alternative voice conversion technique using support vector machine (SVM) as a regression tool that, converts the voice of a source speaker to specific standard target speaker. A nonlinear mapping function between the parameters for the acoustic features of the two speakers has been captured in our work. The vocal tract characteristics have been represented by the line spectral frequencies (LSFs). The kernel induced feature space using radial basis function network type SVM with Gaussian basis function have been used in our work. The codebook based technique has been used to modify the intonation characteristic (pitch contour). Mapping of the pitch contour has been achieved at the word level by associating the codebooks derived from the pitch contours of the source and the target speakers. The speech signals for the desired target speaker have been synthesized using the transformed LSFs along with the modified pitch contour and evaluated using both the subjective and the listening tests. The results signify that the proposed model improves the voice conversion performance in terms of capturing the speakerpsilas identity. However, the performance can further be improved by suitably modifying various user defined parameters used in regression analysis and using more training LSF vectors in the training stage.

[1]  K.-S. Lee,et al.  Statistical Approach for Voice Personality Transformation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Bayya Yegnanarayana,et al.  Prosody modification using instants of significant excitation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Yoshinori Sagisaka,et al.  Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks , 1995, Speech Commun..

[4]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[5]  Zeynep Inanoglu Transforming Pitch in a Voice Conversion Framework , 2003 .

[6]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[7]  Alexander Kain,et al.  Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Adrian G. Bors,et al.  Introduction of the Radial Basis Function (RBF) Networks , 2000 .

[9]  Yannis Stylianou,et al.  On the transformation of the speech spectrum for voice conversion , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[11]  B. Yegnanarayana,et al.  Voice conversion: Factors responsible for quality , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Helenca Duxans Barrobes Voice conversion applied to text-to-speech systems , 2006 .

[13]  Hisao Kuwabara A pitch-synchronous analysis/synthesis system to independently modify formant frequencies and bandwidths for voiced speech , 1984, Speech Commun..

[14]  Shashidhar G. Koolagudi,et al.  Voice Transformation by Mapping the Features at Syllable Level , 2007, PReMI.

[15]  John H. L. Hansen,et al.  Speaker-specific pitch contour modeling and modification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16]  G. White,et al.  Speech recognition experiments with linear predication, bandpass filtering, and dynamic programming , 1976 .

[17]  Oytun Türk,et al.  NEW METHODS FOR VOICE CONVERSION , 2003 .

[18]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[19]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[22]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.