Efficient coding of natural sounds

The auditory system encodes sound by decomposing the amplitude signal arriving at the ear into multiple frequency bands whose center frequencies and bandwidths are approximately exponential functions of the distance from the stapes. This organization is thought to result from the adaptation of cochlear mechanisms to the animal's auditory environment. Here we report that several basic auditory nerve fiber tuning properties can be accounted for by adapting a population of filter shapes to encode natural sounds efficiently. The form of the code depends on sound class, resembling a Fourier transformation when optimized for animal vocalizations and a wavelet transformation when optimized for non-biological environmental sounds. Only for the combined set does the optimal code follow scaling characteristics of physiological data. These results suggest that auditory nerve fibers encode a broad set of natural sounds in a manner consistent with information theoretic principles.

[1]  L. Zadeh,et al.  Frequency Analysis of Variable Networks , 1950, Proceedings of the IRE.

[2]  R. Duffin,et al.  A class of nonharmonic Fourier series , 1952 .

[3]  J. Flanagan Models for Approximating Basilar Membrane Displacement , 1960 .

[4]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[5]  D. D. Greenwood Critical Bandwidth and the Frequency Coordinates of the Basilar Membrane , 1961 .

[6]  I. Whitfield Discharge Patterns of Single Fibers in the Cat's Auditory Nerve , 1966 .

[7]  Alexander Joseph Book reviewDischarge patterns of single fibers in the cat's auditory nerve: Nelson Yuan-Sheng Kiang, with the assistance of Takeshi Watanabe, Eleanor C. Thomas and Louise F. Clark: Research Monograph no. 35. Cambridge, Mass., The M.I.T. Press, 1965 , 1967 .

[8]  E. D. Boer,et al.  On the Principle of Specific Coding , 1973 .

[9]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[10]  R. Voss,et al.  ‘1/fnoise’ in music and speech , 1975, Nature.

[11]  E. Evans Cochlear Nerve and Cochlear Nucleus , 1975 .

[12]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[13]  E. de Boer,et al.  On cochlear encoding: Potentialities and limitations of the reverse‐correlation technique , 1978 .

[14]  E. de Boer,et al.  On cochlear encoding: potentialities and limitations of the reverse-correlation technique. , 1978, The Journal of the Acoustical Society of America.

[15]  S. Zahorian Principal‐Components Analysis for Low Redundancy Encoding of Speech Spectra , 1979 .

[16]  Andrew C. Sleigh,et al.  Physical and Biological Processing of Images , 1983 .

[17]  S. Laughlin,et al.  Matching Coding to Scenes to Enhance Efficiency , 1983 .

[18]  C. Loeffler,et al.  Optimal design of periodically time-varying and multirate digital filters , 1984 .

[19]  S. Biyiksiz,et al.  Multirate digital signal processing , 1985, Proceedings of the IEEE.

[20]  W. S. Rhode,et al.  Characteristics of tone-pip response patterns in relationship to spontaneous rate in cat auditory nerve fibers , 1985, Hearing Research.

[21]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[22]  B. Moore,et al.  Auditory Frequency Selectivity , 1986, Nato ASI Series.

[23]  I. Daubechies,et al.  PAINLESS NONORTHOGONAL EXPANSIONS , 1986 .

[24]  A. Wright,et al.  Hair cell distributions in the normal human cochlea. , 1987, Acta oto-laryngologica. Supplementum.

[25]  B. Moore Frequency Selectivity in Hearing , 1987 .

[26]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[27]  S. Geneva,et al.  Sound Quality Assessment Material: Recordings for Subjective Tests , 1988 .

[28]  L. Carney,et al.  Temporal coding of resonances by low-frequency auditory nerve fibers: single-fiber responses and a population model. , 1988, Journal of neurophysiology.

[29]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  P. Vaidyanathan,et al.  Non-uniform multirate filter banks: theory and design , 1989, IEEE International Symposium on Circuits and Systems,.

[31]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[32]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[33]  R Linsker,et al.  Perceptual neural organization: some approaches based on network models and information theory. , 1990, Annual review of neuroscience.

[34]  B. Moore,et al.  Auditory filter shapes at low center frequencies. , 1990, The Journal of the Acoustical Society of America.

[35]  Adrian S. Lewis,et al.  A 64 Kb/s video codec using the 2-D wavelet transform , 1991, [1991] Proceedings. Data Compression Conference.

[36]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[37]  Jerome M. Shapiro,et al.  Embedded image coding using zerotrees of wavelet coefficients , 1993, IEEE Trans. Signal Process..

[38]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[39]  I. Djokovic,et al.  Results on Biorthogonal Filter Banks , 1994 .

[40]  Zhifeng Zhang,et al.  Adaptive Nonlinear Approximations , 1994 .

[41]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[42]  Richard J. Baker,et al.  Characterising auditory filter nonlinearity , 1994, Hearing Research.

[43]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[44]  I. Toshio An optimal auditory filter , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[45]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[46]  W. Bialek,et al.  Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents , 1995, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[47]  Andreas G. Andreou,et al.  A design framework for low power analog filter banks , 1995 .

[48]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[49]  C. J.,et al.  Maximum Likelihood and Covariant Algorithms for Independent Component Analysis , 1996 .

[50]  M. Vetterli,et al.  Overcomplete expansions and robustness , 1996 .

[51]  B. Moore,et al.  A revision of Zwicker's loudness model , 1996 .

[52]  Barak A. Pearlmutter,et al.  A Context-Sensitive Generalization of ICA , 1996 .

[53]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[54]  Edward H. Adelson,et al.  Noise removal via Bayesian wavelet coring , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[55]  William A. Pearlman,et al.  A new, fast, and efficient image codec based on set partitioning in hierarchical trees , 1996, IEEE Trans. Circuits Syst. Video Technol..

[56]  T J Sejnowski,et al.  Learning the higher-order structure of a natural sound. , 1996, Network.

[57]  T. Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. II. Simulations and measurements. , 1996, The Journal of the Acoustical Society of America.

[58]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[59]  Helmut Bölcskei,et al.  Oversampled filter banks: optimal noise shaping, design freedom, and noise analysis , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[61]  Avideh Zakhor,et al.  Very low bit-rate video coding based on matching pursuits , 1997, IEEE Trans. Circuits Syst. Video Technol..

[62]  K.M. Buckley,et al.  Single-snapshot DOA estimation and source number detection , 1997, IEEE Signal Processing Letters.

[63]  Eero P. Simoncelli Statistical models for images: compression, restoration and synthesis , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[64]  T. Irino,et al.  A time-domain, level-dependent auditory filter: The gammachirp , 1997 .

[65]  J. Cardoso Infomax and maximum likelihood for blind source separation , 1997, IEEE Signal Processing Letters.

[66]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[67]  Toshio Irino,et al.  A time-varying, analysis/synthesis auditory filterbank using the gammachirp , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[68]  Helmut Bölcskei,et al.  Frame-theoretic analysis of oversampled filter banks , 1998, IEEE Trans. Signal Process..

[69]  Vivek K. Goyal,et al.  Quantized Overcomplete Expansions in IRN: Analysis, Synthesis, and Algorithms , 1998, IEEE Trans. Inf. Theory.

[70]  B. Moore Cochlear Hearing Loss , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[71]  Jorg Kliewer,et al.  The complex-valued continuous wavelet transform as a preprocessor for auditory scene analysis , 1998 .

[72]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[73]  Methods for the subjective assessment of small impairments in audio systems , 2015 .

[74]  Terrence J. Sejnowski,et al.  Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations , 1998, NIPS.

[75]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[76]  S. Mallat A wavelet tour of signal processing , 1998 .

[77]  William A. Pearlman,et al.  An efficient, low-complexity audio coder delivering multiple levels of quality for interactive applications , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[78]  Edward A. Lee,et al.  Adaptive Signal Models: Theory, Algorithms, and Audio Applications , 1998 .

[79]  Gernot Kubin,et al.  On speech coding in a perceptual domain , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[80]  Rémi Gribonval Approximations non-linéaires pour l'analyse de signaux sonores , 1999 .

[81]  Bruno A. Olshausen,et al.  PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[82]  M. Brucke,et al.  Silicon cochlea: A digital VLSI implementation of a quantitative model of the auditory system , 1999 .

[83]  Henrique S. Malvar A modulated complex lapped transform and its applications to audio processing , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[84]  William Bialek,et al.  Adaptive Rescaling Maximizes Information Transmission , 2000, Neuron.

[85]  Bernd Edler,et al.  Audio coding using a psychoacoustic pre- and post-filter , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[86]  S.A. Ramprashad High quality embedded wideband speech coding using an inherently layered coding paradigm , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[87]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[88]  G. Manley Cochlear mechanisms from a phylogenetic viewpoint. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[89]  Akihiko Sugiyama,et al.  MPEG-4 natural audio coding , 2000, Signal Process. Image Commun..

[90]  Roy D. Patterson,et al.  Auditory images:How complex sounds are represented in the auditory system , 2000 .

[91]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[92]  Chris Dunn Efficient Audio Coding with Fine-Grain Scalability , 2001 .

[93]  Eliathamby Ambikairajah,et al.  Wideband speech and audio coding using gammatone filter banks , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[94]  Eliathamby Ambikairajah,et al.  Auditory filter bank inversion , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[95]  E. Lopez-Poveda,et al.  A human nonlinear cochlear filterbank. , 2001, The Journal of the Acoustical Society of America.

[96]  Rémi Gribonval,et al.  Fast matching pursuit with a multiscale dictionary of Gaussian chirps , 2001, IEEE Trans. Signal Process..

[97]  Eero P. Simoncelli,et al.  Natural signal statistics and sensory gain control , 2001, Nature Neuroscience.

[98]  Alfred Mertins,et al.  AUDIO CODING BASED ON THE MODULATED LAPPED TRANSFORM (MLT) AND SET PARTITIONING IN HIERARCHICAL TREES , 2001 .

[99]  S. Laughlin,et al.  An Energy Budget for Signaling in the Grey Matter of the Brain , 2001, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[100]  Sugato Chakravarty,et al.  Method for the subjective assessment of intermedi-ate quality levels of coding systems , 2001 .

[101]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[102]  Robert Bregovic,et al.  Multirate Systems and Filter Banks , 2002 .

[103]  Zhen Liu,et al.  Quantifying the intra and inter subband correlations in the zerotree-based wavelet image coders , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[104]  Jin Li,et al.  Embedded audio coding (EAC) with implicit auditory masking , 2002, MULTIMEDIA '02.

[105]  Natasa Kovacevic,et al.  Algorithm 820: A flexible implementation of matching pursuit for Gabor functions on the interval , 2002, TOMS.

[106]  Alfred Mertins,et al.  FROM LOSSY TO LOSSLESS AUDIO CODING USING SPIHT , 2002 .

[107]  L. V. Immerseel,et al.  Digital implementation of linear gammatone filters: Comparison of design methods , 2003 .

[108]  Alfred Mertins,et al.  Scalable to lossless audio compression based on perceptual set partitioning in hierarchical trees (PSPIHT) , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[109]  Terrence J Sejnowski,et al.  Communication in Neuronal Networks , 2003, Science.

[110]  No Value,et al.  IEEE International Conference on Image Processing , 2003 .

[111]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[112]  Pascal Frossard,et al.  A posteriori quantization of progressive matching pursuit streams , 2004, IEEE Transactions on Signal Processing.

[113]  Fabrice Labeau,et al.  Discrete Time Signal Processing , 2004 .

[114]  Susanto Rahardja,et al.  MPEG-4 Scalable to Lossless Audio Coding , 2004 .

[115]  A. Aertsen,et al.  Spectro-temporal receptive fields of auditory neurons in the grassfrog , 1980, Biological Cybernetics.

[116]  Balázs Kövesi,et al.  A scalable speech and audio coding scheme with continuous bitrate flexibility , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.