Instantaneous frequency and bandwidth estimation using filterbank arrays

Accurate estimation of the instantaneous frequency of speech resonances is a hard problem mainly due to phase discontinuities in the speech signal associated with excitation instants. We review a variety of approaches for enhanced frequency and bandwidth estimation in the time-domain and propose a new cognitively motivated approach using filterbank arrays. We show that by filtering speech resonances using filters of different center frequency, bandwidth and shape, the ambiguity in instantaneous frequency estimation associated with amplitude envelope minima and phase discontinuities can be significantly reduced. The novel estimators are shown to perform well on synthetic speech signals with frequency and bandwidth micro-modulations (i.e., modulations within a pitch period), as well as on real speech signals. Filterbank arrays, when applied to frequency and bandwidth modulation index estimation, are shown to reduce the estimation error variance by 85% and 70% respectively.

[1]  I. Titze Nonlinear source-filter coupling in phonation: theory. , 2008, The Journal of the Acoustical Society of America.

[2]  T. V. Ananthapadmanabha,et al.  Calculation of true glottal flow and its components , 1982, Speech Commun..

[3]  Leon Cohen Instantaneous bandwidth , 2015, Defense + Security Symposium.

[4]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals , 1992, Proc. IEEE.

[5]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. II. A/lgorithms and applications , 1992, Proc. IEEE.

[6]  Leon Cohen,et al.  Instantaneous bandwidth and formant bandwidth , 1992, [1992] IEEE Sixth SP Workshop on Statistical Signal and Array Processing.

[7]  Volker Hohmann,et al.  Acoustic features for speech recognition based on Gammatone filterbank and instantaneous frequency , 2011, Speech Commun..

[8]  Fred Cummins,et al.  Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Mike Brookes,et al.  A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[11]  Dimitrios Dimitriadis,et al.  Short-time instantaneous frequency and bandwidth features for speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[12]  Alexandros Potamianos,et al.  Statistical analysis of amplitude modulation in speech signals using an AM-FM model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  S. Shamma On the role of space and time in auditory processing , 2001, Trends in Cognitive Sciences.

[14]  胡晓宁,et al.  Application of formant instantaneous characteristics to speech recognition and speaker identification , 2011 .

[15]  Petros Maragos,et al.  Speech formant frequency and bandwidth tracking using multiband energy demodulation , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Eliathamby Ambikairajah,et al.  Computationally efficient frame-averaged FM feature extraction for speaker recognition , 2009 .

[17]  R. Kumaresan,et al.  Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications , 1999 .

[18]  Petros Maragos,et al.  Robust AM-FM features for speech recognition , 2005, IEEE Signal Processing Letters.

[19]  Katsuhiko Shirai,et al.  Temporal AM-FM combination for robust speech recognition , 2011, Speech Commun..

[20]  Yannis Stylianou,et al.  Adaptive AM–FM Signal Decomposition With Application to Speech Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Petros Maragos,et al.  Speech nonlinearities, modulations, and energy operators , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[24]  Alexandros Potamianos,et al.  On the effect of fundamental frequency on amplitude and frequency modulation patterns in speech resonances , 2010, INTERSPEECH.

[25]  Raghunath S. Holambe,et al.  Speaker Identification Based on Robust AM-FM Features , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[26]  Jyoti Singhai,et al.  AM-FM Features and Their Application to Noise Robust Speech Recognition: A Review , 2010 .