Nonlinear speech model based on support vector machine and wavelet transform

To improve the naturalness of reconstructed speech, nonlinear speech models are paid more and more attention in recent years. A nonlinear speech model for speech synthesis based on support vector machine (SVM) is presented firstly. After speech signal is embedded into phase space, nonlinear map in the model is obtained with support vector regression. It is shown in the experiments that for some pieces of speech, not only can speech be perfectly reconstructed by the system, but also jitter and shimmer in the original signal is preserved. However, the output of the system is quite different from the original one for other pieces. The reason is that the sub-bands with different frequency in the original signal can not be perfectly described by a SVM-based autoregressive model trained with one set of training parameters. Consequently, a multi-band model is then proposed. After the original speech is decomposed into several bands through wavelet packet decomposition, a nonlinear dynamical model based on SVM is constructed for each sub-band signal. It is shown in the experiments that the stability of such system is improved.

[1]  S. Haykin Dynamic Reconstruction of a Chaotic Process: Stability Considerations , 1998 .

[2]  Iain Mann,et al.  An investigation of nonlinear speech synthesis and pitch modification techniques , 2000 .

[3]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[4]  I. Tokuda,et al.  Aihara K: A simple geometrical structure underlying speech signals of the Japanese vowel a , 1996 .

[5]  G. Kubin,et al.  A multi-band nonlinear oscillator model for speech , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[8]  Gernot Kubin,et al.  Synthesis and coding of continuous speech with the nonlinear oscillator model , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Gernot Kubin,et al.  Nonlinear Synthesis of Vowels in the LP Residual Domain with a Regularized RBF Network , 2001, IWANN.

[10]  José Carlos Príncipe,et al.  Nonlinear dynamic modeling of the voiced excitation for improved speech synthesis , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).