Analysis/synthesis of speech based on an adaptive quasi-harmonic plus noise model

Decomposition of speech into a deterministic part and a stochastic part is a typical modeling. Usually, the deterministic part in voiced speech is modeled as a sum of time-varying sinusoids while the stochastic part is modeled as modulated noise. The estimation of sinusoidal parameters assumes that locally speech is a stationary signal. However, this is not true leading to biased amplitude and phase estimation. In this paper, we develop a scheme for speech analysis and synthesis which is able to deal with locally nonstationary frames. Thus, deterministic part it modeled using an adaptive quasi-harmonic model while stochastic part is modeled as time-modulated and frequency-modulated noise. Results show that the reconstructed signal is almost indistinguishable from the original.

[1]  Axel Röbel Parameter estimation for linear AM/FM sinusoids using frequency domain demodulation , 2007, SIP.

[2]  Yannis Stylianou,et al.  AM-FM estimation for speech based on a time-varying sinusoidal model , 2009, INTERSPEECH.

[3]  Yannis Stylianou,et al.  Improving the modeling of the noise part in the harmonic plus noise model of speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[5]  T. F. Quatieri,et al.  Audio Signal Processing Based on Sinusoidal Analysis/Synthesis , 2002 .

[6]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[7]  Eric Moulines,et al.  High-quality speech modification based on a harmonic + noise model , 1995, EUROSPEECH.

[8]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[9]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[10]  Luis Weruaga,et al.  The fan-chirp transform for non-stationary harmonic signals , 2007, Signal Process..

[11]  Yannis Stylianou,et al.  On the properties of a time-varying quasi-harmonic model of speech , 2008, INTERSPEECH.

[12]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[13]  Xavier Serra,et al.  A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition , 1989 .