Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus

We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogram representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus and cortex, and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds.

[1]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[2]  Richard G. Baraniuk,et al.  Sparse Coding via Thresholding and Local Competition in Neural Circuits , 2008, Neural Computation.

[3]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[4]  C. Schreiner,et al.  Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.

[5]  Joseph J. Atick,et al.  What Does the Retina Know about Natural Scenes? , 1992, Neural Computation.

[6]  Gidon Felsen,et al.  A natural approach to studying vision , 2005, Nature Neuroscience.

[7]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[8]  Richard F. Lyon,et al.  A computational model of filtering, detection, and compression in the cochlea , 1982, ICASSP.

[9]  T. Hromádka,et al.  Reliability and Representational Bandwidth in the Auditory Cortex , 2005, Neuron.

[10]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[11]  S. Shamma On the role of space and time in auditory processing , 2001, Trends in Cognitive Sciences.

[12]  S. Laughlin A Simple Coding Procedure Enhances a Neuron's Information Capacity , 1981, Zeitschrift fur Naturforschung. Section C, Biosciences.

[13]  Martin Rehn,et al.  A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields , 2007, Journal of Computational Neuroscience.

[14]  Michael DeWeese,et al.  Optimization Principles for the Neural Code , 1995, NIPS.

[15]  A. Aertsen,et al.  A comparison of the Spectro-Temporal sensitivity of auditory neurons to tonal and natural stimuli , 1981, Biological Cybernetics.

[16]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[17]  Sarah M. N. Woolley,et al.  Modulation Power and Phase Spectrum of Natural Sounds Enhance Neural Encoding Performed by Single Auditory Neurons , 2004, The Journal of Neuroscience.

[18]  M. Escabí,et al.  Neural mechanisms for spectral analysis in the auditory midbrain, thalamus, and cortex. , 2005, International review of neurobiology.

[19]  N. Lesica,et al.  Dynamic Spectrotemporal Feature Selectivity in the Auditory Midbrain , 2008, The Journal of Neuroscience.

[20]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[21]  Bruno A. Olshausen,et al.  Learning real and complex overcomplete representations from the statistics of natural images , 2009, Optical Engineering + Applications.

[22]  S. Laughlin Energy as a constraint on the coding and processing of sensory information , 2001, Current Opinion in Neurobiology.

[23]  V. Caron,et al.  United states. , 2018, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[24]  J. Fritz,et al.  Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex , 2003, Nature Neuroscience.

[25]  P. Földiák,et al.  Forming sparse representations by local anti-Hebbian learning , 1990, Biological Cybernetics.

[26]  M. Merzenich,et al.  Optimizing sound features for cortical neurons. , 1998, Science.

[27]  Ben M. Clopton,et al.  A spectrotemporal analysis of DCN single unit responses to wideband noise in guinea pig , 1991, Hearing Research.

[28]  Ben M Clopton,et al.  Spectrotemporal receptive fields of neurons in cochlear nucleus of guinea pig , 1991, Hearing Research.

[29]  M. Escabí,et al.  Spectral and temporal modulation tradeoff in the inferior colliculus. , 2010, Journal of neurophysiology.

[30]  K.P. Kording,et al.  Learning of sparse auditory receptive fields , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[31]  Christoph E Schreiner,et al.  Functional architecture of auditory cortex , 2002, Current Opinion in Neurobiology.

[32]  T. Hromádka,et al.  Sparse Representation of Sounds in the Unanesthetized Auditory Cortex , 2008, PLoS biology.

[33]  Nicole C. Rust,et al.  In praise of artifice , 2005, Nature Neuroscience.

[34]  Konrad P. Körding,et al.  Sparse Spectrotemporal Coding of Sounds , 2003, EURASIP J. Adv. Signal Process..

[35]  W. Bialek,et al.  Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents , 1995, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[36]  J L Gallant,et al.  Sparse coding and decorrelation in primary visual cortex during natural vision. , 2000, Science.

[37]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[38]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[39]  William Bialek,et al.  Spikes: Exploring the Neural Code , 1996 .

[40]  M. DeWeese,et al.  Binary Spiking in Auditory Cortex , 2003, The Journal of Neuroscience.

[41]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[42]  Didier A Depireux,et al.  Lagged cells in the inferior colliculus of the awake ferret , 2010, The European journal of neuroscience.

[43]  Christian K. Machens,et al.  Linearity of Cortical Receptive Fields Measured with Natural Sounds , 2004, The Journal of Neuroscience.

[44]  J. Gallant,et al.  Predicting neuronal responses during natural vision , 2005, Network.

[45]  Yann LeCun,et al.  Unsupervised Learning of Sparse Features for Scalable Audio Classification , 2011, ISMIR.

[46]  Na Li,et al.  Spectrotemporal Receptive Fields in the Inferior Colliculus Revealing Selectivity for Spectral Motion in Conspecific Vocalizations , 2007, The Journal of Neuroscience.

[47]  Zhaoping Li,et al.  Understanding Auditory Spectro-Temporal Receptive Fields and Their Changes with Input Statistics by Efficient Coding Principles , 2011, PLoS Comput. Biol..

[48]  J. V. van Hateren,et al.  Independent component filters of natural images compared with simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.