Comparative study of voice conversion framework with line spectral frequency and Mel-Frequency Cepstral Coefficients as features using artficial neural networks

This paper is intended to formulate the mapping function using Feed-forward Neural Networks on Line Spectral Frequency and Mel Frequency Cepstral Coefficient and to compare their outcomes to decipher the better solution to the spectral mapping impediment. The experimentation is confined to the augmentation of spectral and excitation (glottal) domains of speech. LSF and MFCC are used to represent the spectrum and as input predictor variables to the above mentioned neural networks. It contains the use of neural network in the voice conversion framework. The function of artificial neural network is to map the spectral characteristics of a source speaker to the target speaker so as to obtain an authenticated voice conversion model. The temporal alignment of the speech uttered by source and the target is attained using Dynamic Time Warping (Dynamic Programming). The excitation mapping is accomplished using Residual Selection method. The performances of these Voice Conversion systems are assessed using subjective and objective measures which assure the genuineness of the conversion system design.

[1]  Kishore Prahallad,et al.  Voice conversion using Artificial Neural Networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Alexander Kain,et al.  Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  A. F. Machado,et al.  VOICE CONVERSION: A CRITICAL SURVEY , 2010 .

[4]  Hermann Ney,et al.  A study on residual prediction techniques for voice conversion , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Levent M. Arslan,et al.  Subband based voice conversion , 2002, INTERSPEECH.

[6]  K SREENIVASA RAO,et al.  Role of neural network models for developing speech systems , 2011 .

[7]  Mukesh A. Zaveri,et al.  Complex Cepstrum Based Voice Conversion Using Radial Basis Function , 2014 .

[8]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[9]  Levent M. Arslan,et al.  Voice conversion by codebook mapping of line spectral frequencies and excitation spectrum , 1997, EUROSPEECH.

[10]  Jagannath H. Nirmal,et al.  Voice conversion using General Regression Neural Network , 2014, Appl. Soft Comput..

[11]  V. Tiwari MFCC and its applications in speaker recognition , 2010 .

[12]  L. H. Zhang,et al.  A ANN Based High Quality Method for Voice Conversion , 2010, 2010 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM).

[13]  Kishore Prahallad,et al.  Spectral Mapping Using Artificial Neural Networks for Voice Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.