A hierarchical sparse coding model predicts acoustic feature encoding in both auditory midbrain and cortex

The auditory pathway consists of multiple stages, from the cochlear nucleus to the auditory cortex. Neurons acting at different stages have different functions and exhibit different response properties. It is unclear whether these stages share a common encoding mechanism. We trained an unsupervised deep learning model consisting of alternating sparse coding and max pooling layers on cochleogram-filtered human speech. Evaluation of the response properties revealed that computing units in lower layers exhibited spectro-temporal receptive fields (STRFs) similar to those of inferior colliculus neurons measured in physiological experiments, including properties such as sound onset and termination, checkerboard pattern, and spectral motion. Units in upper layers tended to be tuned to phonetic features such as plosivity and nasality, resembling the results of field recording in human auditory cortex. Variation of the sparseness level of the units in each higher layer revealed a positive correlation between the sparseness level and the strength of phonetic feature encoding. The activities of the units in the top layer, but not other layers, correlated with the dynamics of the first two formants (F1, F2) of all phonemes, indicating the encoding of phoneme dynamics in these units. These results suggest that the principles of sparse coding and max pooling may be universal in the human auditory pathway.

[1]  Anitha Pasupathy,et al.  Contour Curvature As an Invariant Code for Objects in Visual Area V4 , 2016, The Journal of Neuroscience.

[2]  C. Schreiner,et al.  Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.

[3]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[4]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[5]  J. H. Casseday,et al.  Neural tuning for sound duration: role of inhibitory mechanisms in the inferior colliculus. , 1994, Science.

[6]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[7]  Na Li,et al.  Spectrotemporal Receptive Fields in the Inferior Colliculus Revealing Selectivity for Spectral Motion in Conspecific Vocalizations , 2007, The Journal of Neuroscience.

[8]  T J Sejnowski,et al.  Learning the higher-order structure of a natural sound. , 1996, Network.

[9]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[10]  Eric D Young,et al.  Neural representation of spectral and temporal information in speech , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[11]  N. Lesica,et al.  Dynamic Spectrotemporal Feature Selectivity in the Auditory Midbrain , 2008, The Journal of Neuroscience.

[12]  J. Werker,et al.  Developmental changes in perception of nonnative vowel contrasts. , 1994, Journal of experimental psychology. Human perception and performance.

[13]  Chengxu Zhuang,et al.  Deep Learning Predicts Correlation between a Functional Signature of Higher Visual Areas and Sparse Firing of Neurons , 2017, Front. Comput. Neurosci..

[14]  C E Schreiner,et al.  Neural processing of amplitude-modulated sounds. , 2004, Physiological reviews.

[15]  M. Malmierca,et al.  Stimulus-Specific Adaptation in the Inferior Colliculus of the Anesthetized Rat , 2009, The Journal of Neuroscience.

[16]  Emmanuel Dupoux,et al.  Phonetics embedding learning with side information , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[17]  Sarah M. N. Woolley,et al.  Sparse and Background-Invariant Coding of Vocalizations in Auditory Scenes , 2013, Neuron.

[18]  T. Hromádka,et al.  Sparse Representation of Sounds in the Unanesthetized Auditory Cortex , 2008, PLoS biology.

[19]  Ewan Dunbar,et al.  A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling , 2015, INTERSPEECH.

[20]  Thane Fremouw,et al.  Methods for the Analysis of Auditory Processing in the Brain , 2004, Annals of the New York Academy of Sciences.

[21]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[22]  Okko Johannes Räsänen,et al.  Analyzing distributional learning of phonemic categories in unsupervised deep neural networks , 2016, CogSci.

[23]  James L. McClelland,et al.  Unsupervised learning of vowel categories from infant-directed speech , 2007, Proceedings of the National Academy of Sciences.

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Craig A. Atencio,et al.  Spectral Processing in Auditory Cortex , 2011 .

[26]  G. Langner,et al.  Temporal and spatial coding of periodicity information in the inferior colliculus of awake chinchilla (Chinchilla laniger) , 2002, Hearing Research.

[27]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[28]  Hideaki Kikuchi,et al.  Unsupervised learning of vowels from continuous speech based on self-organized phoneme acquisition model , 2010, INTERSPEECH.

[29]  C. Connor,et al.  Responses to contour features in macaque area V4. , 1999, Journal of neurophysiology.

[30]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[31]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[32]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[33]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[34]  M. Escabí,et al.  Spectral and temporal modulation tradeoff in the inferior colliculus. , 2010, Journal of neurophysiology.

[35]  C. Schreiner,et al.  Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. , 1988, Journal of neurophysiology.

[36]  D. Tolhurst,et al.  Characterizing the sparseness of neural codes , 2001, Network.

[37]  B. Willmore,et al.  Sparse coding in striate and extrastriate visual cortex. , 2011, Journal of neurophysiology.

[38]  Sharon Goldwater,et al.  A role for the developing lexicon in phonetic category acquisition. , 2013, Psychological review.

[39]  Hallowell Davis,et al.  An active process in cochlear mechanics , 1983, Hearing Research.

[40]  Nicole L. Carlson,et al.  Sparse Codes for Speech Predict Spectrotemporal Receptive Fields in the Inferior Colliculus , 2012, PLoS Comput. Biol..

[41]  P. Chauvel,et al.  Specialization of left auditory cortex for speech perception in man depends on temporal coding. , 1999, Cerebral cortex.

[42]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[43]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[44]  Xiaoqin Wang,et al.  Temporal Coding in Auditory Cortex , 2011 .

[45]  Tomaso A. Poggio,et al.  A Canonical Neural Circuit for Cortical Nonlinear Operations , 2008, Neural Computation.

[46]  Gerald Langner,et al.  Periodicity coding in the auditory system , 1992, Hearing Research.

[47]  Tomaso A. Poggio,et al.  Biophysical Models of Neural Computation: Max and Tuning Circuits , 2006, WImBI.

[48]  Micha Elsner,et al.  Bootstrapping a Unified Model of Lexical and Phonetic Acquisition , 2012, ACL.

[49]  Volkmar Glauche,et al.  Ventral and dorsal pathways for language , 2008, Proceedings of the National Academy of Sciences.

[50]  Tasha Nagamine,et al.  Exploring how deep neural networks form phonemic categories , 2015, INTERSPEECH.

[51]  S. Neely,et al.  A model for active elements in cochlear biomechanics. , 1986, The Journal of the Acoustical Society of America.

[52]  Joel Zylberberg,et al.  Inhibitory Interneurons Decorrelate Excitatory Cells to Drive Sparse Code Formation in a Spiking Model of V1 , 2013, The Journal of Neuroscience.

[53]  Xiaolin Hu,et al.  Sparsity-Regularized HMAX for Visual Recognition , 2014, PloS one.

[54]  Pawel J. Jastreboff,et al.  Salicylate-induced abnormal activity in the inferior colliculus of rats , 1995, Hearing Research.

[55]  A. King,et al.  Unraveling the principles of auditory cortical processing: can we learn from the visual system? , 2009, Nature Neuroscience.

[56]  Julie E. Elie,et al.  Neural processing of natural sounds , 2014, Nature Reviews Neuroscience.

[57]  Richard F. Lyon,et al.  A computational model of filtering, detection, and compression in the cochlea , 1982, ICASSP.

[58]  Jessica Maye,et al.  Infant sensitivity to distributional information can affect phonetic discrimination , 2002, Cognition.

[59]  K. Sen,et al.  Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .

[60]  J. Rauschecker,et al.  Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing , 2009, Nature Neuroscience.

[61]  Alison L. Barth,et al.  Experimental evidence for sparse firing in the neocortex , 2012, Trends in Neurosciences.

[62]  Keith Johnson,et al.  Phonetic Feature Encoding in Human Superior Temporal Gyrus , 2014, Science.

[63]  J. Ostwald,et al.  Temporal Coding of Amplitude and Frequency Modulation in the Rat Auditory Cortex , 1995, The European journal of neuroscience.

[64]  Suzanne Curtin,et al.  PRIMIR: A Developmental Framework of Infant Speech Processing , 2005 .

[65]  J. Nadal,et al.  The acquisition of allophonic rules: Statistical learning with linguistic constraints , 2006, Cognition.