Speech analysis using instantaneous frequency deviation

In this paper, our aim is to derive a phase spectrum representation computed via the short-time Fourier transform. Specifically, we are interested in developing a narrow-band speech representation – employing 20-40 ms analysis windows. Furthermore, this representation should be as physically meaningful as the magnitude spectrum. To achieve these ends, we concentrate on instantaneous frequency (IF) derived from the phase spectrum. In doing so, we introduce the IF deviation spectrum, and show that this spectrum exhibits pitch and formant structure similar to the magnitude spectrum. Lastly we demonstrate the advantages of the proposed IF deviation spectrum over the IF distribution spectrum proposed earlier in the literature.

[1]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[2]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[3]  Mingui Sun,et al.  Discrete-time instantaneous frequency and its computation , 1993, IEEE Trans. Signal Process..

[4]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[5]  Kuldip K. Paliwal,et al.  Short-time phase spectrum in speech processing: A review and some experimental results , 2007, Digit. Signal Process..

[6]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. II. A/lgorithms and applications , 1992, Proc. IEEE.

[7]  Kuldip K. Paliwal,et al.  Further intelligibility results from human listening tests using the short-time phase spectrum , 2006, Speech Commun..

[8]  L. Cohen,et al.  Time-frequency distributions-a review , 1989, Proc. IEEE.

[9]  Kuldip K. Paliwal,et al.  Frequency-related representation of speech , 2003, INTERSPEECH.

[10]  Francis Charpentier,et al.  Pitch detection using the short-term phase spectrum , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Prashant Parikh A Theory of Communication , 2010 .

[12]  L. Mandel Interpretation of Instantaneous Frequencies , 1974 .

[13]  Petros Maragos,et al.  Time-frequency distributions for automatic speech recognition , 2001, IEEE Trans. Speech Audio Process..

[14]  Günther Palm,et al.  Effects of phase on the perception of intervocalic stop consonants , 1997, Speech Commun..

[15]  Hamid Al-Nashi Phase unwrapping of digital signals , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Yadong Wang,et al.  Average instantaneous frequency (AIF) and average log-envelopes (ALE) for ASR with the Aurora 2 database , 2003, INTERSPEECH.

[17]  Steven Kay,et al.  A Fast and Accurate Single Frequency Estimator , 2022 .

[18]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[19]  M.R. Schroeder,et al.  Models of hearing , 1975, Proceedings of the IEEE.

[20]  J. L. Flanagan,et al.  PHASE VOCODER , 2008 .

[21]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[22]  Takao Kobayashi,et al.  Harmonics tracking and pitch extraction based on instantaneous frequency , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[23]  D. Friedman Formulation of a vector distance measure for the instantaneous-frequency distribution (IFD) of speech , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Petros Maragos,et al.  Speech formant frequency and bandwidth tracking using multiband energy demodulation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[25]  Jae S. Lim,et al.  Phase-only signal reconstruction , 1980, ICASSP.

[26]  D. Friedman,et al.  Instantaneous-frequency distribution vs. time: An interpretation of the phase structure of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.