Sparse gammatone signal model optimized for English speech does not match the human auditory filters

Evidence that neurosensory systems use sparse signal representations as well as improved performance of signal processing algorithms using sparse signal models raised interest in sparse signal coding in the last years. For natural audio signals like speech and environmental sounds, gammatone atoms have been derived as expansion functions that generate a nearly optimal sparse signal model (Smith, E., Lewicki, M., 2006. Efficient auditory coding. Nature 439, 978-982). Furthermore, gammatone functions are established models for the human auditory filters. Thus far, a practical application of a sparse gammatone signal model has been prevented by the fact that deriving the sparsest representation is, in general, computationally intractable. In this paper, we applied an accelerated version of the matching pursuit algorithm for gammatone dictionaries allowing real-time and large data set applications. We show that a sparse signal model in general has advantages in audio coding and that a sparse gammatone signal model encodes speech more efficiently in terms of sparseness than a sparse modified discrete cosine transform (MDCT) signal model. We also show that the optimal gammatone parameters derived for English speech do not match the human auditory filters, suggesting for signal processing applications to derive the parameters individually for each applied signal class instead of using psychometrically derived parameters. For brain research, it means that care should be taken with directly transferring findings of optimality for technical to biological systems.

[1]  Roy D. Patterson,et al.  Auditory images:How complex sounds are represented in the auditory system , 2000 .

[2]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[3]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[4]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[5]  Eliathamby Ambikairajah,et al.  Wideband speech and audio coding using gammatone filter banks , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Terrence J Sejnowski,et al.  Communication in Neuronal Networks , 2003, Science.

[7]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[8]  Edward A. Lee,et al.  Adaptive Signal Models: Theory, Algorithms, and Audio Applications , 1998 .

[9]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[10]  Avideh Zakhor,et al.  Very low bit-rate video coding based on matching pursuits , 1997, IEEE Trans. Circuits Syst. Video Technol..

[11]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[12]  Henrique S. Malvar A modulated complex lapped transform and its applications to audio processing , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[13]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[14]  Gernot Kubin,et al.  Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach , 2005, EURASIP J. Adv. Signal Process..

[15]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[16]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[17]  Birger Kollmeier,et al.  PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  渡辺馨 Objective measurement method of audio quality in accordance with ITU-R Recommendation BS. 1387 , 2001 .

[19]  Mike E. Davies,et al.  Sparse audio representations using the MCLT , 2006, Signal Process..

[20]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[21]  Sacha Krstulovic,et al.  Mptk: Matching Pursuit Made Tractable , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Huan Zhou,et al.  An adaptive tree-based progressive audio compression scheme , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[23]  B. Moore,et al.  Auditory filter shapes at low center frequencies. , 1990, The Journal of the Acoustical Society of America.

[24]  I. Toshio An optimal auditory filter , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[25]  James R. Hopgood,et al.  The effect of sensor placement in blind source separation , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[26]  Akihiko Sugiyama,et al.  MPEG-4 natural audio coding , 2000, Signal Process. Image Commun..

[27]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[28]  G. Manley Cochlear mechanisms from a phylogenetic viewpoint. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[29]  B. Moore Frequency Selectivity in Hearing , 1987 .

[30]  S. Geneva,et al.  Sound Quality Assessment Material: Recordings for Subjective Tests , 1988 .

[31]  RECOMMENDATION ITU-R BS.1387-1 - Method for objective measurements of perceived audio quality , 2002 .

[32]  Jorg Kliewer,et al.  The complex-valued continuous wavelet transform as a preprocessor for auditory scene analysis , 1998 .

[33]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[34]  Zhifeng Zhang,et al.  Adaptive Nonlinear Approximations , 1994 .

[35]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[36]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[37]  Yaakov Tsaig,et al.  Recent advances in sparsity-driven signal recovery , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[38]  Rémi Gribonval,et al.  Fast matching pursuit with a multiscale dictionary of Gaussian chirps , 2001, IEEE Trans. Signal Process..

[39]  Vivek K. Goyal,et al.  Quantized Overcomplete Expansions in IRN: Analysis, Synthesis, and Algorithms , 1998, IEEE Trans. Inf. Theory.

[40]  Roy D. Patterson,et al.  Dynamic, Compressive Gammachirp Auditory Filterbank for Perceptual Signal Processing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[41]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[42]  T. Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. II. Simulations and measurements. , 1996, The Journal of the Acoustical Society of America.

[43]  Dennis Gabor,et al.  Theory of communication , 1946 .

[44]  Rémi Gribonval Approximations non-linéaires pour l'analyse de signaux sonores , 1999 .

[45]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[46]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[47]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[48]  Michael S. Lewicki,et al.  Efficient Coding of Time-Relative Structure Using Spikes , 2005, Neural Computation.

[49]  T J Sejnowski,et al.  Learning the higher-order structure of a natural sound. , 1996, Network.

[50]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[51]  S. Laughlin,et al.  An Energy Budget for Signaling in the Grey Matter of the Brain , 2001, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[52]  Pascal Frossard,et al.  A posteriori quantization of progressive matching pursuit streams , 2004, IEEE Transactions on Signal Processing.