Synchrony capture filterbank: auditory-inspired signal processing for tracking individual frequency components in speech.

A processing scheme for speech signals is proposed that emulates synchrony capture in the auditory nerve. The role of stimulus-locked spike timing is important for representation of stimulus periodicity, low frequency spectrum, and spatial location. In synchrony capture, dominant single frequency components in each frequency region impress their time structures on temporal firing patterns of auditory nerve fibers with nearby characteristic frequencies (CFs). At low frequencies, for voiced sounds, synchrony capture divides the nerve into discrete CF territories associated with individual harmonics. An adaptive, synchrony capture filterbank (SCFB) consisting of a fixed array of traditional, passive linear (gammatone) filters cascaded with a bank of adaptively tunable, bandpass filter triplets is proposed. Differences in triplet output envelopes steer triplet center frequencies via voltage controlled oscillators (VCOs). The SCFB exhibits some cochlea-like responses, such as two-tone suppression and distortion products, and possesses many desirable properties for processing speech, music, and natural sounds. Strong signal components dominate relatively greater numbers of filter channels, thereby yielding robust encodings of relative component intensities. The VCOs precisely lock onto harmonics most important for formant tracking, pitch perception, and sound separation.

[1]  Ramdas Kumaresan,et al.  On decomposing speech into modulated components , 2000, IEEE Trans. Speech Audio Process..

[2]  J E Hind,et al.  Coding of information pertaining to paired low-frequency tones in single auditory nerve fibers of the squirrel monkey. , 1967, Journal of neurophysiology.

[3]  C. D. Geisler,et al.  Two-tone suppression in auditory nerve of the cat: rate-intensity and temporal analyses. , 1978, The Journal of the Acoustical Society of America.

[4]  Patrick E. Mantey,et al.  Automatic frequency control via digital filtering , 1968 .

[5]  Francis D. Natali,et al.  AFC Tracking Algorithms , 1984, IEEE Trans. Commun..

[6]  Floyd M. Gardner Properties of Frequency Difference Detectors , 1985, IEEE Trans. Commun..

[7]  Murray B. Sachs,et al.  Biological Basis of Hearing-Aid Design , 2002, Annals of Biomedical Engineering.

[8]  Campbell L. Searle,et al.  Time‐domain analysis of auditory‐nerve fiber firing rates , 1989 .

[9]  B. Delgutte,et al.  Neurobiological Foundations for the Theory of Harmony in Western Tonal Music , 2001, Annals of the New York Academy of Sciences.

[10]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[11]  Ramdas Kumaresan,et al.  RISC: An Improved Costas Estimator-Predictor Filter Bank For Decomposing Multicomponent Signals , 1994, IEEE Seventh SP Workshop on Statistical Signal and Array Processing.

[12]  Rhee Man Kil,et al.  Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..

[13]  H. Spoendlin,et al.  Degeneration behaviour of the cochlear nerve , 2004, Archiv für klinische und experimentelle Ohren-, Nasen- und Kehlkopfheilkunde.

[14]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[15]  Ian C. Bruce,et al.  Robust Formant Tracking for Continuous Speech With Speaker Variability , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  R. Meddis,et al.  A unitary model of pitch perception. , 1997, The Journal of the Acoustical Society of America.

[17]  E D Young,et al.  Auditory nerve representation of vowels in background noise. , 1983, Journal of neurophysiology.

[18]  B. Delgutte,et al.  Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. , 1996, Journal of neurophysiology.

[19]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[20]  Avery Li-Chun Wang,et al.  Instantaneous and frequency-warped signal processing techniques for auditory source separation , 1994 .

[21]  Bertrand Delgutte,et al.  Behavioral / Systems / Cognitive Spatiotemporal Representation of the Pitch of Harmonic Complex Tones in the Auditory Nerve , 2010 .

[22]  P. Cariani Temporal Coding of Periodicity Pitch in the Auditory System: An Overview , 1999, Neural plasticity.

[23]  M. Charles Liberman,et al.  Reciprocal Synapses Between Outer Hair Cells and their Afferent Terminals: Evidence for a Local Neural Network in the Mammalian Cochlea , 2008, Journal of the Association for Research in Otolaryngology.

[24]  David G. Messerschmitt Frequency Detectors for PLL Acquisition in Timing and Carrier Recovery , 1979, IEEE Trans. Commun..

[25]  Steven Kay,et al.  A Fast and Accurate Single Frequency Estimator , 2022 .

[26]  Ramdas Kumaresan,et al.  Synchrony capture filterbank (SCFB): An auditory periphery inspired method for tracking sinusoids , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  A. Oxenham,et al.  The psychophysics of pitch , 2005 .

[28]  Richard M. Stern,et al.  Hearing Is Believing: Biologically Inspired Methods for Robust Automatic Speech Recognition , 2012, IEEE Signal Processing Magazine.

[29]  J. Pickles,et al.  Psychophysical frequency resolution in the cat as determined by simultaneous masking and its relation to auditory-nerve resolution. , 1979, The Journal of the Acoustical Society of America.

[30]  R. Kumaresan,et al.  Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications , 1999 .

[31]  Peter Dallos,et al.  Overview: Cochlear Neurobiology , 1996 .