Adaptive AM–FM Signal Decomposition With Application to Speech Analysis

In this paper, we present an iterative method for the accurate estimation of amplitude and frequency modulations (AM-FM) in time-varying multi-component quasi-periodic signals such as voiced speech. Based on a deterministic plus noise representation of speech initially suggested by Laroche (“HNM: A simple, efficient harmonic plus noise model for speech,” Proc. WASPAA, Oct., 1993, pp. 169-172), and focusing on the deterministic representation, we reveal the properties of the model showing that such a representation is equivalent to a time-varying quasi-harmonic representation of voiced speech. Next, we show how this representation can be used for the estimation of amplitude and frequency modulations and provide the conditions under which such an estimation is valid. Finally, we suggest an adaptive algorithm for nonparametric estimation of AM-FM components in voiced speech. Based on the estimated amplitude and frequency components, a high-resolution time-frequency representation is obtained. The suggested approach was evaluated on synthetic AM-FM signals, while using the estimated AM-FM information, speech signal reconstruction was performed, resulting in a high signal-to-reconstruction error ratio (around 30 dB).

[1]  A. W. Rihaczek Principles of high-resolution radar , 1969 .

[2]  E. Paulus,et al.  Speech Signal Processing , 1997, The Electrical Engineering Handbook - Six Volume Set.

[3]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[4]  Jean Laroche A new analysis/synthesis system of musical signals using Prony's method-application to heavily damped percussive sounds , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[5]  J. F. Kaiser,et al.  On a simple algorithm to calculate the 'energy' of a signal , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  S. Haykin,et al.  The Chirplet Transform : A Generalization of Gabor ’ s Logon Transform , 1991 .

[7]  Petros Maragos,et al.  On separating amplitude from frequency modulations using energy operators , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Yannis Stylianou,et al.  HNM: a simple, efficient harmonic+noise model for speech , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[10]  Joseph M. Francos,et al.  Estimation of amplitude and phase parameters of multicomponent signals , 1995, IEEE Trans. Signal Process..

[11]  Benjamin Friedlander,et al.  The discrete polynomial-phase transform , 1995, IEEE Trans. Signal Process..

[12]  Patrick Flandrin,et al.  Improving the readability of time-frequency and time-scale representations by the reassignment method , 1995, IEEE Trans. Signal Process..

[13]  P. Loughlin,et al.  On the amplitude‐ and frequency‐modulation decomposition of signals , 1996 .

[14]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[15]  David Vakman,et al.  On the analytic signal, the Teager-Kaiser energy algorithm, and other methods for defining amplitude and frequency , 1996, IEEE Trans. Signal Process..

[16]  Ananthram Swami,et al.  On polynomial phase signals with time-varying amplitudes , 1996, IEEE Trans. Signal Process..

[17]  Thomas F. Quatieri,et al.  AM-FM separation using auditory-motivated filters , 1997, IEEE Trans. Speech Audio Process..

[18]  Balasubramaniam Santhanam Multicomponent AM-FM energy demodulation with applications to signal processing and communications , 1997 .

[19]  L. Gavidia-Ceballos,et al.  A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment , 1998, IEEE Transactions on Biomedical Engineering.

[20]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[21]  Marios S. Pattichis,et al.  AM-FM texture segmentation in electron microscopic muscle imaging , 1999, IEEE Transactions on Medical Imaging.

[22]  Jian Li,et al.  Amplitude estimation of sinusoidal signals: survey, new results, and an application , 2000, IEEE Trans. Signal Process..

[23]  Peter C. Doerschuk,et al.  Statistical AM-FM models, extended Kalman filter demodulation, Cramer-Rao bounds, and speech analysis , 2000, IEEE Trans. Signal Process..

[24]  Ramdas Kumaresan,et al.  On decomposing speech into modulated components , 2000, IEEE Trans. Speech Audio Process..

[25]  Khaled H. Hamed,et al.  Time-frequency analysis , 2003 .

[26]  Fabrice Labeau,et al.  Discrete Time Signal Processing , 2004 .

[27]  Saeed Gazor,et al.  Adaptive Maximum Windowed Likelihood Multicomponent AM-FM Signal Decomposition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Luis Weruaga,et al.  The fan-chirp transform for non-stationary harmonic signals , 2007, Signal Process..

[29]  Thomas F. Quatieri,et al.  Sinewave Analysis/Synthesis Based on the Fan-Chirp Tranform , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[30]  Gaël Richard,et al.  Estimation of Frequency for AM/FM Models Using the Phase Vocoder Framework , 2008, IEEE Transactions on Signal Processing.

[31]  Yannis Stylianou,et al.  On the properties of a time-varying quasi-harmonic model of speech , 2008, INTERSPEECH.

[32]  Yannis Stylianou,et al.  Chirp rate estimation of speech based on a time-varying quasi-harmonic model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.