Applications of positive time-frequency distributions to speech processing

Much of our current knowledge and intuition of speech is derived from analyses involving assumptions of short-time stationarity (e.g., the speech spectrogram). Such methods are, by their very nature, incapable of revealing the true nonstationary nature of speech. A careful consideration of the theory of time-frequency distributions (TFDs), however, allows the construction of methods that reveal far more of the nonstationarities of speech, thereby highlighting just what it is that conventional approaches miss. We apply two iterative methods for generating positive time-frequency distributions (TFDs) to speech analysis. Both methods make use of multiple sources of information (e.g., multiple spectrograms) to yield a high-resolution estimate of the joint time-frequency energy density of speech. Plosive events and formant harmonic structure are simultaneously preserved in these TFDs. Rapidly time-varying formants are also resolved by these TFDs, and harmonic structure is revealed, independent of sweep rate; this result is quite different from that seen with conventional speech spectrograms. The speech features observed in these distributions demonstrate that conventional sliding window techniques lose or distort much of the rich nonstationary structure of speech. Examples for synthetic formants and real speech are provided. The differences between joint distributions and conditional distributions are also illustrated. >

[1]  Les E. Atlas,et al.  Positive time-frequency distributions via maximum entropy deconvolution of the evolutionary spectrum , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[3]  D. B. Preston Spectral Analysis and Time Series , 1983 .

[4]  S. Gull,et al.  Image reconstruction from incomplete and noisy data , 1978, Nature.

[5]  R. Johnson,et al.  Properties of cross-entropy minimization , 1981, IEEE Trans. Inf. Theory.

[6]  W. Koenig,et al.  The Sound Spectrograph , 1946 .

[7]  Langford B. White Resolution enhancement in time-frequency signal processing using inverse methods , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[8]  B. Hannaford,et al.  Approximating time-frequency density functions via optimal combinations of spectrograms , 1994, IEEE Signal Processing Letters.

[9]  L. Cohen,et al.  Time-frequency distributions-a review , 1989, Proc. IEEE.

[10]  Lawrence R. Rabiner,et al.  On the relations between modeling approaches for speech recognition , 1990, IEEE Trans. Inf. Theory.

[11]  Guy Melard,et al.  CONTRIBUTIONS TO EVOLUTIONARY SPECTRAL THEORY , 1989 .

[12]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[13]  Rodney W. Johnson,et al.  Axiomatic characterization of the directed divergences and their linear combinations , 1979, IEEE Trans. Inf. Theory.

[14]  Leon Cohen,et al.  Positive time-frequency distribution functions , 1985, IEEE Trans. Acoust. Speech Signal Process..

[15]  Joseph A. O'Sullivan,et al.  Deblurring subject to nonnegativity constraints , 1992, IEEE Trans. Signal Process..

[16]  Les E. Atlas,et al.  Construction of positive time-frequency distributions , 1994, IEEE Trans. Signal Process..

[17]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inverse problems , 1990 .