A realtime analysis/synthesis Gammatone filterbank

Gammatone filterbanks are widely used in computational auditory models for modeling the peripheral filtering function of the cochlea. However, the high computational complexity and time consumption limits its usage in portable acoustic applications. To address this issue, a realtime and efficient digital implementation of Gammatone filterbank is proposed. The decomposed signal can be resynthesized by summation directly. We systematically examine the Gammatone filterbank with perceptual evaluation of speech quality (PESQ), Short-Time Objective Intelligibility (STOI), signal-to-noise ratio (SNR) and computational complexity. Evaluations and comparisons show that the proposed method has good performance and less computational complexity.

[1]  C Giguère,et al.  A computational model of the auditory periphery for speech and hearing research. I. Ascending path. , 1994, The Journal of the Acoustical Society of America.

[2]  Daniel Pressnitzer,et al.  Real-Time auditory Models , 2005, ICMC.

[3]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[4]  Kentaro Ishizuka,et al.  Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[6]  Martin Cooke,et al.  Modelling auditory processing and organisation , 1993, Distinguished dissertations in computer science.

[7]  Carver A. Mead,et al.  Neuromorphic electronic systems , 1990, Proc. IEEE.

[8]  DeLiang Wang,et al.  An auditory-based feature for robust speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  P. Woodland,et al.  A computational model of the auditory periphery for speech and hearing research. II. Descending paths. , 1994, The Journal of the Acoustical Society of America.

[10]  B. Moore,et al.  A revision of Zwicker's loudness model , 1996 .

[11]  DeLiang Wang,et al.  An algorithm to improve speech recognition in noise for hearing-impaired listeners. , 2013, The Journal of the Acoustical Society of America.

[12]  J. Flanagan Models for Approximating Basilar Membrane Displacement , 1960 .

[13]  Yi Jiang,et al.  Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Xiao Chen,et al.  Performance Evaluation of a Gammatone Filterbank for the Embedded System , 2013 .

[15]  M. Ruggero,et al.  Timing of spike initiation in cochlear afferents: dependence on site of innervation. , 1987, Journal of neurophysiology.

[16]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[17]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[18]  Eliathamby Ambikairajah,et al.  Auditory filter bank inversion , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[19]  Richard F. Lyon,et al.  Introducing the Differentiated All-Pole and One-Zero Gammatone Filter Responses and their Analog VLSI Log-domain Implementation , 2006, 2006 49th IEEE International Midwest Symposium on Circuits and Systems.

[20]  T J Sejnowski,et al.  Learning the higher-order structure of a natural sound. , 1996, Network.

[21]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[22]  Roy D. Patterson,et al.  A FUNCTIONAL MODEL OF NEURAL ACTIVITY PATTERNS AND AUDITORY IMAGES , 2004 .

[23]  Richard F. Lyon,et al.  ALL-POLE MODELS OF AUDITORY FILTERING , 1997 .

[24]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[26]  R. Patterson,et al.  B OF THE SVOS FINAL REPORT ( Part A : The Auditory Filterbank ) AN EFFICIENT AUDITORY FIL TERBANK BASED ON THE GAMMATONE FUNCTION , 2010 .