The ERBlet transform: An auditory-based time-frequency representation with perfect reconstruction

This paper describes a method for obtaining a perceptually motivated and perfectly invertible time-frequency representation of a sound signal. Based on frame theory and the recent non-stationary Gabor transform, a linear representation with resolution evolving across frequency is formulated and implemented as a non-uniform filterbank. To match the human auditory time-frequency resolution, the transform uses Gaussian windows equidistantly spaced on the psychoacoustic “ERB” frequency scale. Additionally, the transform features adaptable resolution and redundancy. Simulations showed that perfect reconstruction can be achieved using fast iterative methods and preconditioning even using one filter per ERB and a very low redundancy (1.08). Comparison with a linear gammatone filterbank showed that the ERBlet approximates well the auditory time-frequency resolution.

[1]  Tim Jürgens,et al.  A computer model of the auditory periphery and its application to the study of hearing. , 2013, Advances in experimental medicine and biology.

[2]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[3]  Roy D. Patterson,et al.  A Dynamic Compressive Gammachirp Auditory Filterbank , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Gernot Kubin,et al.  Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach , 2005, EURASIP J. Adv. Signal Process..

[5]  Alfred Mertins,et al.  Analysis and design of gammatone signal models. , 2009, The Journal of the Acoustical Society of America.

[6]  Thomas Grill,et al.  CONSTRUCTING AN INVERTIBLE CONSTANT-Q TRANSFORM WITH NONSTATIONARY GABOR FRAMES , 2011 .

[7]  Bernhard Laback,et al.  Time–Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Jonathan J O'Donovan,et al.  Perceptually motivated time-frequency analysis. , 2005, The Journal of the Acoustical Society of America.

[9]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[10]  T. Houtgast,et al.  Intensity discrimination of Gaussian-windowed tones: indications for the shape of the auditory frequency-time window. , 1999, The Journal of the Acoustical Society of America.

[11]  Hossein Najaf-Zadeh,et al.  Auditory-inspired sparse representation of audio signals , 2011, Speech Commun..

[12]  Pierrick Philippe,et al.  Wavelet packet filterbanks for low time delay audio coding , 1999, IEEE Trans. Speech Audio Process..

[13]  Karlheinz Gröchenig,et al.  Acceleration of the frame algorithm , 1993, IEEE Trans. Signal Process..

[14]  Mohammad D. Abolhassani,et al.  A human auditory tuning curves matched wavelet function , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[15]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[16]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[17]  E. Lopez-Poveda,et al.  A human nonlinear cochlear filterbank. , 2001, The Journal of the Acoustical Society of America.

[18]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[19]  Nicki Holighaus,et al.  Theory, implementation and applications of nonstationary Gabor frames , 2011, J. Comput. Appl. Math..

[20]  Bruno Torrésani,et al.  The Linear Time Frequency Analysis Toolbox , 2012, Int. J. Wavelets Multiresolution Inf. Process..

[21]  Frank Baumgarte,et al.  Improved audio coding using a psychoacoustic model based on a cochlear filter bank , 2002, IEEE Trans. Speech Audio Process..

[22]  Zoran Cvetkovic,et al.  Nonuniform oversampled filter banks for audio signal processing , 2003, IEEE Trans. Speech Audio Process..

[23]  M. Hampejs,et al.  Double Preconditioning for Gabor Frames , 2006, IEEE Transactions on Signal Processing.