Decomposition of speech into voiced and unvoiced components based on a state-space signal model

We present a novel method for decomposing speech into voiced and unvoiced components. After demodulating the variations in the spectral envelope, energy and pitch, the method involves applying a bank of Kalman filters to separate the harmonic and non-harmonic components of the signal. This approach relies on a state-space representation of the composite signal, and provides a way to estimate accurately the harmonic component without the large delay required by a linear phase comb filter. However it also requires prior knowledge of the variance of the unvoiced component and the state transition parameters. We present a novel method to determine these parameters accurately based on a variant of the expectation-maximization algorithm. Modifications for dealing with unvoiced segments and voicing onset are also described.

[1]  W.B. Kleijn,et al.  Transformation and decomposition of the speech signal for coding , 1994, IEEE Signal Processing Letters.

[2]  Joseph P. Campbell,et al.  Voiced/Unvoiced classification of speech with applications to the U.S. government LPC-10E algorithm , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Yannis Stylianou Efficient Decomposition of Speech Signals Into a Deterministic and a Stochastic Part , 1996, Fourth International Symposium on Signal Processing and Its Applications.

[4]  Mari Ostendorf,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[5]  Peter Gruber,et al.  Estimation of quasiperiodic signal parameters by means of dynamic signal models , 1994, IEEE Trans. Signal Process..

[6]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[7]  Jacek Stachurski,et al.  A pitch pulse evolution model for linear predictive coding of speech , 1997 .

[8]  I. A. Gerson,et al.  Techniques for improving the performance of CELP type speech coders , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.