Auditory-inspired pitch extraction using a Synchrony Capture Filterbank and phase alignment

The question of how harmonic sounds produce strong, low pitches at their fundamental frequencies, f0s, has been of theoretical and practical interest to scientists and engineers for many decades. Currently the best auditory models for f0 pitch, e.g. [1], are based on bandpass filtering (cochlear mechanics), half-wave rectification and low-pass filtering (haircell transduction and synaptic transmission), channel autocorrelations (all-order interspike interval statistics) aggregated into a summary autocorrelation, and an analysis that determines the most prevalent interspike intervals. As a possible alternative to autocorrelation computations, we propose an alternative model that uses an adaptive Synchrony Capture Filterbank (SCFB) in which groups of filters or channels in a filterbank neighborhood are driven exclusively (captured) by dominant frequency components that are closest to them. The channel outputs are then adaptively phase aligned with respect to a common time reference to compute a Summary Phase Aligned Function (SPAF), aggregated across all channels, from which f0 can be easily extracted.

[1]  J. L. Goldstein,et al.  A central spectrum model: a synthesis of auditory-nerve timing and place cues in monaural communication of frequency spectrum. , 1983, The Journal of the Acoustical Society of America.

[2]  The Oxford handbook of auditory science , 2015 .

[3]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[4]  Vijay Kumar Peddinti,et al.  Synchrony capture filterbank: auditory-inspired signal processing for tracking individual frequency components in speech. , 2013, The Journal of the Acoustical Society of America.

[5]  B. Delgutte,et al.  Neural correlates of the pitch of complex tones. II. Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance region for pitch. , 1996, Journal of neurophysiology.

[6]  A. Oxenham,et al.  The psychophysics of pitch , 2005 .

[7]  R Meddis,et al.  Modeling the identification of concurrent vowels with different fundamental frequencies. , 1992, The Journal of the Acoustical Society of America.

[8]  Guy J. Brown,et al.  Separation of speech from interfering sounds based on oscillatory correlation , 1999, IEEE Trans. Neural Networks.

[9]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[10]  J. L. Goldstein An optimum processor theory for the central formation of the pitch of complex tones. , 1973, The Journal of the Acoustical Society of America.

[11]  A. Oxenham Pitch Perception , 2012, The Journal of Neuroscience.

[12]  Peter A. Cariani,et al.  Neural timing nets , 2001, Neural Networks.

[13]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[14]  J. Licklider,et al.  A duplex theory of pitch perception , 1951, Experientia.

[15]  R. Meddis,et al.  A unitary model of pitch perception. , 1997, The Journal of the Acoustical Society of America.

[16]  P. Cariani,et al.  A TEMPORAL MODEL FOR PITCH MULTIPLICITY AND TONAL CONSONANCE , 2004 .

[17]  B. Delgutte,et al.  Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. , 1996, Journal of neurophysiology.