AM-FM estimation for speech based on a time-varying sinusoidal model

In this paper we present a method based on a time-varying sinusoidal model for a robust and accurate estimation of amplitude and frequency modulations (AM-FM) in speech. The suggested approach has two main steps. First, speech is modeled as a sinusoidal model with time-varying amplitudes. Specifically, the model makes use of a first order time polynomial with complex coefficients for capturing instantaneous amplitude and frequency (phase) components. Next, the model parameters are updated by using the previously estimated instantaneous phase information. Thus, an iterative scheme for AM-FM decomposition of speech is suggested which was validated on synthetic AM-FM signals and tested on reconstruction of voiced speech signals where the signal-to-error reconstruction ratio (SERR) was used as measure. Compared to the standard sinusoidal representation, the suggested approach found to improve the corresponding SERR by 47%, resulting in over 30 dB of SERR.

[1]  Yannis Stylianou,et al.  On the properties of a time-varying quasi-harmonic model of speech , 2008, INTERSPEECH.

[2]  Yannis Stylianou,et al.  HNM: a simple, efficient harmonic+noise model for speech , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[3]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[4]  Thippur V. Sreenivas,et al.  Performance analysis of AM-FM estimators , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[5]  Luis Weruaga,et al.  The fan-chirp transform for non-stationary harmonic signals , 2007, Signal Process..

[6]  M. Mięsikowska Speech signal processing and analysis tool , 2007 .

[7]  Petros Maragos,et al.  On separating amplitude from frequency modulations using energy operators , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  David Vakman,et al.  On the analytic signal, the Teager-Kaiser energy algorithm, and other methods for defining amplitude and frequency , 1996, IEEE Trans. Signal Process..

[10]  Thomas F. Quatieri,et al.  AM-FM separation using auditory-motivated filters , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Jean Laroche A new analysis/synthesis system of musical signals using Prony's method-application to heavily damped percussive sounds , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Thomas F. Quatieri,et al.  Sinewave Analysis/Synthesis Based on the Fan-Chirp Tranform , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.