Using a Cascade of Asymmetric Resonators with Fast-Acting Compression as a Cochlear Model for Machine-Hearing Applications

Every day, machines process many thousands of hours of audio signals through a realistic cochlear model. They extract features, inform classifiers and recommenders, and identify copyrighted material. The machine-hearing approach to such tasks has taken root in recent years, because hearingbased approaches perform better than we can do with more conventional sound-analysis approaches. We use a bio-mimetic “cascade of asymmetric resonators with fast-acting compression” (CARFAC)—an efficient sound analyzer that incorporates the hearing research community’s findings on nonlinear auditory filter models and cochlear wave mechanics. The CAR-FAC is based on a pole–zero filter cascade (PZFC) model of auditory filtering, in combination with a multi-time-scale coupled automaticgain-control (AGC) network. It uses simple nonlinear extensions of conventional digital filter stages, and runs fast due to its low complexity. The PZFC plus AGC network, the CAR-FAC, mimics features of auditory physiology, such as masking, compressive traveling-wave response, and the stability of zero-crossing times with signal level. Its output “neural activity pattern” is converted to a “stabilized auditory image” to capture pitch, melody, and other temporal and spectral features of the sound.

[1]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[2]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[3]  T. Irino,et al.  A time-domain, level-dependent auditory filter: The gammachirp , 1997 .

[4]  Richard F. Lyon Filter cascades as analogs of the cochlea , 1998 .

[5]  Richard J. Baker,et al.  An efficient characterisation of human auditory filtering across level and frequency that is physiologically reasonable , 1998 .

[6]  B. Moore,et al.  Frequency selectivity as a function of level and frequency measured with uniformly exciting notched noise. , 2000, The Journal of the Acoustical Society of America.

[7]  Masashi Unoki,et al.  Extending the domain of center frequencies for the compressive gammachirp auditory filter. , 2003, The Journal of the Acoustical Society of America.

[8]  Roy D. Patterson,et al.  A FUNCTIONAL MODEL OF NEURAL ACTIVITY PATTERNS AND AUDITORY IMAGES , 2004 .

[9]  T. Irino,et al.  Comparison of the roex and gammachirp filters as representations of the auditory filter. , 2006, The Journal of the Acoustical Society of America.

[10]  Samy Bengio,et al.  Sound Retrieval and Ranking Using Sparse Auditory Representations , 2010, Neural Computation.

[11]  Richard F. Lyon,et al.  Sparse coding of auditory features for machine hearing in interference , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  R. Lyon A Pole-Zero Filter Cascade Provides Good Fits to Human Masking Data and to Basilar Membrane and Neural Data , 2011 .