Synchrony capture filterbank (SCFB): Auditory-inspired signal processing for frequency tracking

The mammalian auditory system is a more robust and versatile sound analyzer than any artificial system that has been developed to date. Nature found a simple yet elegant solution for the hearing mechanism. Incorporating some key aspects of the functional organization of the mammalian auditory system into artificial signalprocessing systems may drastically simplify problems of auditory representation and scene analysis such that capabilities for acoustic signal separation, detection, classification, recognition and identification can be greatly improved. The objective of the thesis is to mimic the functionality of the mammalian peripheral auditory system in a digital computer by developing a synchrony capture filterbank (SCFB) algorithm. This thesis is primarily inspired by two aspects of the peripheral auditory system: (1) synchrony capture, a phenomenon observed in the auditory nerve which involves the preferential synchronization of the discharges in a given frequency region of the cochlea to a single dominant frequency component in that region. In other words, a strong dominant frequency component suppresses any interfering weaker tones. (2) the spatial arrangement of the mammalian cochleae. The SCFB algorithm is used to track the frequency components of a speech signal, extract the pitch or fundamental frequency of quasi-periodic sounds. To emulate synchrony capture, the proposed filterbank is designed as a two step process, which includes a coarse and a fine analysis. The first stage is a broad filter, followed by a bank of three adaptively tunable narrower bandpass filters, which resembles the basilar membrane and the three rows of outer hair cells in the inner ear. This filterbank attempts to emulate synchrony capture-like behavior using these adaptive filters, by creating a competition for different channels amongst frequency components that not only accurately reflect their relative magnitudes, but is also invariant with respect to absolute signal amplitude. These bandpass filters are tuned by using a voltage controlled oscillator (VCO) whose frequency is steered by a frequency discriminator loop (FDL). The resulting filterbank is used to process synthetic signals and speech, and it is shown that the VCOs can track the individual low frequency harmonics and the strongest harmonic present in each formant region. Finally, these SCFB outputs are used to compute fundamental frequency or pitch, f0 of quasi-periodic sounds present in the signal. Currently, auto-correlation based models are widely used for pitch extraction. Although there is overwhelming neurophysiological evidence for auto-correlation-like representations of sounds in the temporal firing patterns of neurons in the auditory nerve and brainstem, how the central auditory system makes use of these representations is still not well understood. Although neuronal populations that carry out a binaural crosscorrelation operation have been long identified in the auditory brainstem, no obvious analogous neural time-delay architectures for monaural auto-correlation have yet been found. This motivates the search for an alternative signal processing strategy. An approach based on SCFB is proposed as a possible alternative to autocorrelation computation. The outputs of the SCFB are adaptively phase aligned with respect to a common time reference and added to compute a summary phase aligned function (SPAF), from which fundamental frequency or pitch, f0 can then be extracted. Results show that component frequencies are f0 are faithfully tracked.

[1]  Ramdas Kumaresan,et al.  RISC: An Improved Costas Estimator-Predictor Filter Bank For Decomposing Multicomponent Signals , 1994, IEEE Seventh SP Workshop on Statistical Signal and Array Processing.

[2]  H. Spoendlin,et al.  Degeneration behaviour of the cochlear nerve , 2004, Archiv für klinische und experimentelle Ohren-, Nasen- und Kehlkopfheilkunde.

[3]  Richard M. Stern,et al.  Hearing Is Believing: Biologically Inspired Methods for Robust Automatic Speech Recognition , 2012, IEEE Signal Processing Magazine.

[4]  J. Pickles,et al.  Psychophysical frequency resolution in the cat as determined by simultaneous masking and its relation to auditory-nerve resolution. , 1979, The Journal of the Acoustical Society of America.

[5]  B. Delgutte,et al.  Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. , 1996, Journal of neurophysiology.

[6]  Avery Li-Chun Wang,et al.  Instantaneous and frequency-warped signal processing techniques for auditory source separation , 1994 .

[7]  C. L. Searle,et al.  Time-domain analysis of auditory-nerve-fiber firing rates. , 1990, The Journal of the Acoustical Society of America.

[8]  R. Vaccaro Digital control : a state-space approach , 1995 .

[9]  John P. Costas Residual Signal Analysis - A Search and Destroy Approach to Spectral Analysis , 1980 .

[10]  Oded Ghitza Auditory models and human performance in tasks related to speech coding and speech recognition , 1994 .

[11]  William E. Brownell,et al.  The Piezoelectric Outer Hair Cell , 2006 .

[12]  A. Poblano,et al.  Otoacoustic Emissions and Evoked Potentials in Infants after Breast-Feeding Jaundice ——Hearing Dysfunction in Breast-Feeding Jaundice , 2012 .

[13]  Patrick E. Mantey,et al.  Automatic frequency control via digital filtering , 1968 .

[14]  A. Oxenham,et al.  The psychophysics of pitch , 2005 .

[15]  L. Robles,et al.  Mechanics of the mammalian cochlea. , 2001, Physiological reviews.

[16]  M. Charles Liberman,et al.  Reciprocal Synapses Between Outer Hair Cells and their Afferent Terminals: Evidence for a Local Neural Network in the Mammalian Cochlea , 2008, Journal of the Association for Research in Otolaryngology.

[17]  P. Cariani Temporal Coding of Periodicity Pitch in the Auditory System: An Overview , 1999, Neural plasticity.

[18]  David G. Messerschmitt Frequency Detectors for PLL Acquisition in Timing and Carrier Recovery , 1979, IEEE Trans. Commun..

[19]  Gene H. Golub,et al.  Matrix computations , 1983 .

[20]  Simon Haykin,et al.  Communication Systems , 1978 .

[21]  Peter Dallos,et al.  Overview: Cochlear Neurobiology , 1996 .

[22]  Francis D. Natali,et al.  AFC Tracking Algorithms , 1984, IEEE Trans. Commun..

[23]  Floyd M. Gardner Properties of Frequency Difference Detectors , 1985, IEEE Trans. Commun..

[24]  Ramdas Kumaresan,et al.  Synchrony capture filterbank (SCFB): An auditory periphery inspired method for tracking sinusoids , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Ian C. Bruce,et al.  Robust Formant Tracking for Continuous Speech With Speaker Variability , 2003, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Aage R. Møller,et al.  Hearing: Its Physiology and Pathophysiology , 2000 .

[27]  Rhee Man Kil,et al.  Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..

[28]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[29]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[30]  Steven Kay,et al.  A Fast and Accurate Single Frequency Estimator , 2022 .

[31]  R. Meddis,et al.  A unitary model of pitch perception. , 1997, The Journal of the Acoustical Society of America.

[32]  Ramdas Kumaresan,et al.  On decomposing speech into modulated components , 2000, IEEE Trans. Speech Audio Process..

[33]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[34]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[35]  J E Hind,et al.  Coding of information pertaining to paired low-frequency tones in single auditory nerve fibers of the squirrel monkey. , 1967, Journal of neurophysiology.

[36]  Elie J. Baghdady,et al.  Theory of Stronger-Signal Capture in FM Reception , 1958, Proceedings of the IRE.