Novel Spectro-Temporal Codes and Computations for Auditory Signal Representation and Separation

Abstract : In the past three years, we have developed algorithms that emulate the phenomenon of synchrony capture in the auditory nerve. Synchrony capture means that the dominant component in a given frequency band preferentially drives auditory nerve fibers innervating the entire corresponding frequency region of the cochlea. Our algorithm, called the synchrony capture filterbank (SCFB) consists of a bank of broadly tuned filters (not unlike the basilar membrane) in cascade with narrower filters (not unlike outer hair cells) that adaptively lock onto locally-dominant frequency components to produce synchrony capture. This local behavior enables a robust encoding of the running power spectrum based on relative numbers of channels recruited by different frequency components. The filterbank precisely tracks individual time-varying frequency components, such as low harmonics and formant frequencies in speech, in the midst of noise and auditory clutter. This precise tracking in turn can be used to enhance the separation of concurrent periodic sounds. We envision that the project will result in improved front-ends that can enhance voices in noise and better separate sounds.

[1]  J. Pickles,et al.  Psychophysical frequency resolution in the cat as determined by simultaneous masking and its relation to auditory-nerve resolution. , 1979, The Journal of the Acoustical Society of America.

[2]  John P. Costas Residual Signal Analysis - A Search and Destroy Approach to Spectral Analysis , 1980 .

[3]  P. Cariani Temporal Coding of Periodicity Pitch in the Auditory System: An Overview , 1999, Neural plasticity.

[4]  Marie Hays Heiner Hearing Is Believing , 2012 .

[5]  Aage R. Møller,et al.  Hearing: Its Physiology and Pathophysiology , 2000 .

[6]  B. Delgutte,et al.  Neural correlates of the pitch of complex tones. II. Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance region for pitch. , 1996, Journal of neurophysiology.

[7]  G. Manley,et al.  SPRINGER HANDBOOK OF AUDITORY RESEARCH , 2014 .

[8]  Ramdas Kumaresan,et al.  Synchrony capture filterbank (SCFB): An auditory periphery inspired method for tracking sinusoids , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Ian C. Bruce,et al.  Robust Formant Tracking for Continuous Speech With Speaker Variability , 2003, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  J E Hind,et al.  Coding of information pertaining to paired low-frequency tones in single auditory nerve fibers of the squirrel monkey. , 1967, Journal of neurophysiology.

[11]  R. Meddis,et al.  A unitary model of pitch perception. , 1997, The Journal of the Acoustical Society of America.

[12]  E D Young,et al.  Auditory nerve representation of vowels in background noise. , 1983, Journal of neurophysiology.

[13]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[14]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[15]  Ramdas Kumaresan,et al.  On decomposing speech into modulated components , 2000, IEEE Trans. Speech Audio Process..

[16]  Oded Ghitza Auditory models and human performance in tasks related to speech coding and speech recognition , 1994 .

[17]  A. Oxenham,et al.  The psychophysics of pitch , 2005 .

[18]  Patrick E. Mantey,et al.  Automatic frequency control via digital filtering , 1968 .

[19]  B. Delgutte,et al.  Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. , 1996, Journal of neurophysiology.

[20]  C. D. Geisler,et al.  Two-tone suppression in auditory nerve of the cat: rate-intensity and temporal analyses. , 1978, The Journal of the Acoustical Society of America.

[21]  William E. Brownell,et al.  The Piezoelectric Outer Hair Cell , 2006 .

[22]  C. Darwin Auditory grouping , 1997, Trends in Cognitive Sciences.

[23]  Elie J. Baghdady,et al.  Theory of Stronger-Signal Capture in FM Reception , 1958, Proceedings of the IRE.

[24]  Steven Kay,et al.  A Fast and Accurate Single Frequency Estimator , 2022 .

[25]  Avery Li-Chun Wang,et al.  Instantaneous and frequency-warped signal processing techniques for auditory source separation , 1994 .

[26]  David T. Kemp Otoacoustic emissions and evoked potentials , 2010 .

[27]  Murray B. Sachs,et al.  Biological Basis of Hearing-Aid Design , 2002, Annals of Biomedical Engineering.

[28]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[29]  L. Robles,et al.  Mechanics of the mammalian cochlea. , 2001, Physiological reviews.

[30]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[31]  Rhee Man Kil,et al.  Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..

[32]  Francis D. Natali,et al.  AFC Tracking Algorithms , 1984, IEEE Trans. Commun..

[33]  Floyd M. Gardner Properties of Frequency Difference Detectors , 1985, IEEE Trans. Commun..

[34]  P. K. Chaturvedi,et al.  Communication Systems , 2002, IFIP — The International Federation for Information Processing.

[35]  R. Vaccaro Digital control : a state-space approach , 1995 .

[36]  C. L. Searle,et al.  Time-domain analysis of auditory-nerve-fiber firing rates. , 1990, The Journal of the Acoustical Society of America.

[37]  Paul Albert Fuchs,et al.  Oxford Handbook of Auditory Science The Ear , 2010 .

[38]  Bertrand Delgutte,et al.  Behavioral / Systems / Cognitive Spatiotemporal Representation of the Pitch of Harmonic Complex Tones in the Auditory Nerve , 2010 .

[39]  M. Charles Liberman,et al.  Reciprocal Synapses Between Outer Hair Cells and their Afferent Terminals: Evidence for a Local Neural Network in the Mammalian Cochlea , 2008, Journal of the Association for Research in Otolaryngology.

[40]  David G. Messerschmitt Frequency Detectors for PLL Acquisition in Timing and Carrier Recovery , 1979, IEEE Trans. Commun..

[41]  Peter Dallos,et al.  Overview: Cochlear Neurobiology , 1996 .

[42]  Ramdas Kumaresan,et al.  On separating voiced-speech into its components , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[43]  B. Delgutte,et al.  Neurobiological Foundations for the Theory of Harmony in Western Tonal Music , 2001, Annals of the New York Academy of Sciences.

[44]  Ramdas Kumaresan,et al.  RISC: An Improved Costas Estimator-Predictor Filter Bank For Decomposing Multicomponent Signals , 1994, IEEE Seventh SP Workshop on Statistical Signal and Array Processing.

[45]  H. Spoendlin,et al.  Degeneration behaviour of the cochlear nerve , 2004, Archiv für klinische und experimentelle Ohren-, Nasen- und Kehlkopfheilkunde.