论文信息 - Sparse Representations for the Cocktail Party Problem

Sparse Representations for the Cocktail Party Problem

A striking feature of many sensory processing problems is that there appear to be many more neurons engaged in the internal representations of the signal than in its transduction. For example, humans have ∼30,000 cochlear neurons, but at least 1000 times as many neurons in the auditory cortex. Such apparently redundant internal representations have sometimes been proposed as necessary to overcome neuronal noise. We instead posit that they directly subserve computations of interest. Here we provide an example of how sparse overcomplete linear representations can directly solve difficult acoustic signal processing problems, using as an example monaural source separation using solely the cues provided by the differential filtering imposed on a source by its path from its origin to the cochlea [the head-related transfer function (HRTF)]. In contrast to much previous work, the HRTF is used here to separate auditory streams rather than to localize them in space. The experimentally testable predictions that arise from this model, including a novel method for estimating the optimal stimulus of a neuron using data from a multineuron recording experiment, are generic and apply to a wide range of sensory computations.

[1] Hagai Attias,et al. Temporal Low-Order Statistics of Natural Sounds , 1996, NIPS.

[2] Eero P. Simoncelli,et al. Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[3] Eero P. Simoncelli,et al. Natural signal statistics and sensory gain control , 2001, Nature Neuroscience.

[4] S. Sheft,et al. A simulated “cocktail party” with up to three sound sources , 1996, Perception & psychophysics.

[5] R. Fletcher. Semi-Definite Matrix Constraints in Optimization , 1985 .

[6] Israel Nelken,et al. Responses of auditory-cortex neurons to structural features of natural sounds , 1999, Nature.

[7] Pierre Comon,et al. Blind separation of sources, part II: Problems statement , 1991, Signal Process..

[8] Masakazu Konishi,et al. Mechanisms of sound localization in the barn owl (Tyto alba) , 1979, Journal of comparative physiology.

[9] William Bialek,et al. Reading a Neural Code , 1991, NIPS.

[10] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[11] J. Arezzo,et al. Auditory stream segregation in monkey auditory cortex: effects of frequency separation, presentation rate, and tone duration. , 2004, The Journal of the Acoustical Society of America.

[12] Kazuya Takeda,et al. Estimating Head Related Transfer Function Using Multiple Regression Analysis , 2000 .

[13] J L Gallant,et al. Sparse coding and decorrelation in primary visual cortex during natural vision. , 2000, Science.

[14] M. Sutter. Shapes and level tolerances of frequency tuning curves in primary auditory cortex: quantitative measures and population codes. , 2000, Journal of neurophysiology.

[15] Pierre Comon,et al. Independent component analysis, A new concept? , 1994, Signal Process..

[16] Paris Smaragdis,et al. Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[17] J. White,et al. Epilepsy in Small World Networks the Journal of Neuroscience for Peer Review Only , 2004 .

[18] F L Wightman,et al. Localization using nonindividualized head-related transfer functions. , 1993, The Journal of the Acoustical Society of America.

[19] Eric Moulines,et al. A blind source separation technique using second-order statistics , 1997, IEEE Trans. Signal Process..

[20] Michael S. Lewicki,et al. Efficient coding of natural sounds , 2002, Nature Neuroscience.

[21] Michael S. Lewicki,et al. Efficient auditory coding , 2006, Nature.

[22] Tomaso Poggio,et al. Computational vision and regularization theory , 1985, Nature.

[23] J. Eggermont,et al. Stimulus dependence of spectro-temporal receptive fields in cat primary auditory cortex , 2004, Hearing Research.

[24] Paul M. Hofman,et al. Bayesian reconstruction of sound localization cues from responses to random spectra , 2002, Biological Cybernetics.

[25] K. Sen,et al. Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .

[26] Michael Elad,et al. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[27] S. Rickard,et al. DOA estimation of many W-disjoint orthogonal sources from two mixtures using DUET , 2000, Proceedings of the Tenth IEEE Workshop on Statistical Signal and Array Processing (Cat. No.00TH8496).

[28] Xiaoqin Wang,et al. Auditory Cortical Responses Elicited in Awake Primates by Random Spectrum Stimuli , 2003, The Journal of Neuroscience.

[29] Anat Levin,et al. User Assisted Separation of Reflections from a Single Image Using a Sparsity Prior , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] L. Abbott,et al. Responses of neurons in primary and inferior temporal visual cortices to natural scenes , 1997, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[31] Michael Zibulevsky,et al. Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[32] Terrence J Sejnowski,et al. Communication in Neuronal Networks , 2003, Science.

[33] Bruno A. Olshausen,et al. A new window on sound , 2002, Nature Neuroscience.

[34] Sam T. Roweis,et al. One Microphone Source Separation , 2000, NIPS.

[35] H Farid,et al. Separating reflections from images by use of independent component analysis. , 1999, Journal of the Optical Society of America. A, Optics, image science, and vision.

[36] Masakazu Konishi. Sound Localization in the Barn Owl , 1971 .

[37] K. D. Punta,et al. An ultra-sparse code underlies the generation of neural sequences in a songbird , 2002 .

[38] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[39] D. Donoho,et al. Maximal Sparsity Representation via l 1 Minimization , 2002 .

[40] F L Wightman,et al. Headphone simulation of free-field listening. II: Psychophysical validation. , 1989, The Journal of the Acoustical Society of America.

[41] Audra E. Kosh,et al. Linear Algebra and its Applications , 1992 .

[42] M. Merzenich,et al. Optimizing sound features for cortical neurons. , 1998, Science.

[43] Terrence J. Sejnowski,et al. Blind source separation of more sources than mixtures using overcomplete representations , 1999, IEEE Signal Processing Letters.

[44] Shun-ichi Amari,et al. Adaptive blind signal processing-neural network approaches , 1998, Proc. IEEE.

[45] Barak A. Pearlmutter,et al. Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[46] Bruno A Olshausen,et al. Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[47] H. Steven Colburn,et al. Role of spectral detail in sound-source localization , 1998, Nature.

[48] Christian K. Machens,et al. Linearity of Cortical Receptive Fields Measured with Natural Sounds , 2004, The Journal of Neuroscience.

[49] Gert Cauwenberghs,et al. Monaural separation of independent acoustical components , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[50] N. C. Singh,et al. Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli , 2001 .

[51] Christoph E Schreiner,et al. Spectrotemporal structure of receptive fields in areas AI and AAF of mouse auditory cortex. , 2003, Journal of neurophysiology.

[52] S. Shamma,et al. Analysis of dynamic spectra in ferret primary auditory cortex. II. Prediction of unit responses to arbitrary dynamic spectra. , 1996, Journal of neurophysiology.

[53] R. Linsker. Separation of a mixture of acoustic sources into its components , 2002 .

[54] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[55] Terrence J. Sejnowski,et al. Learning Overcomplete Representations , 2000, Neural Computation.

[56] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[57] T. Hromádka,et al. Reliability and Representational Bandwidth in the Auditory Cortex , 2005, Neuron.

[58] J. Rauschecker,et al. Perceptual Organization of Tone Sequences in the Auditory Cortex of Awake Macaques , 2005, Neuron.

[59] J. Gallant,et al. Natural Stimulus Statistics Alter the Receptive Field Structure of V1 Neurons , 2004, The Journal of Neuroscience.

[60] Tomaso Poggio,et al. Models of object recognition , 2000, Nature Neuroscience.

[61] Yuanqing Li,et al. Analysis of Sparse Representation and Blind Source Separation , 2004, Neural Computation.

[62] Joseph F. Murray,et al. Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[63] William B. Levy,et al. Energy Efficient Neural Codes , 1996, Neural Computation.

[64] Te-Won Lee,et al. A Maximum Likelihood Approach to Single-channel Source Separation , 2003, J. Mach. Learn. Res..

[65] Terrence J. Sejnowski,et al. The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[66] Konrad P. Körding,et al. Sparse Spectrotemporal Coding of Sounds , 2003, EURASIP J. Adv. Signal Process..

[67] Jonathan Z. Simon,et al. Robust Spectrotemporal Reverse Correlation for the Auditory System: Optimizing Stimulus Design , 2000, Journal of Computational Neuroscience.

[68] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[69] Yoshitaka Nakajima,et al. Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[70] M. DeWeese,et al. Binary Spiking in Auditory Cortex , 2003, The Journal of Neuroscience.

[71] Michael C. Mozer,et al. Monaural Separation and Classification of Mixed Signals : a Support-vector Regression Perspective , 2001 .