Cross‐spectral methods with applications to speech processing

Cross‐spectral methods were first presented in the mid‐1980’s as a method for accurately estimating stable parameters such as modem baud rate and carrier frequencies. The stability of these signal parameters makes it possible to integrate large amounts of data to accurately estimate parameters even under degraded conditions. Since biological signals, such as speech, are not stationary, classical analysis methods, including normal cross‐spectral methods, are poorly suited to the problem. Presented here are methods which take advantage of the structure of speech and the phase properties of the Fourier transform. They are based on the cross‐spectral methods of the 80’s, but have the advantage that these newer methods provide good accuracy and resolution for nonstationary signals such as speech. In addition, they provide a simple method for taking advantage of signal structure, such as the harmonic properties of speech, which results from the quasi‐periodic pitch excitation. Specific problems addressed are ac...

[1]  Richard Lippmann,et al.  A comparison of signal processing front ends for automatic word recognition , 1995, IEEE Trans. Speech Audio Process..

[2]  Harvey Fletcher,et al.  Speech and hearing in communication, 2nd ed. , 1953 .

[3]  Leon Cohen,et al.  Joint representation in time and frequency scale for harmonic type signals , 1994, Proceedings of IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis.

[4]  William J. Williams,et al.  Improved time-frequency representation of multicomponent signals using exponential kernels , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Leon Cohen,et al.  Scale and harmonic-type signals , 1994, Optics & Photonics.

[6]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[7]  Douglas J. Nelson Invertible time-frequency surfaces , 1998, Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis (Cat. No.98TH8380).

[8]  D. Nelson Correlation based speech formant recovery , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Leon Cohen,et al.  Scale transform in speech analysis , 1999, IEEE Trans. Speech Audio Process..

[10]  L. Cohen,et al.  Time-frequency distributions-a review , 1989, Proc. IEEE.

[11]  Douglas J. Nelson,et al.  Invertible time-frequency representations , 1998, Optics & Photonics.

[12]  Patrick J. Loughlin,et al.  An information-theoretic approach to positive time-frequency distributions , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  S. Howard Bartley,et al.  The relation of pitch to frequency. , 1950 .

[14]  Douglas J. Nelson Estimation of FM modulation of multi-component signals from the Fourier phase , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .