Spiking network optimized for word recognition in noise predicts auditory system hierarchy

The auditory neural code is resilient to acoustic variability and capable of recognizing sounds amongst competing sound sources, yet, the transformations enabling noise robust abilities are largely unknown. We report that a hierarchical spiking neural network (HSNN) optimized to maximize word recognition accuracy in noise and multiple talkers predicts organizational hierarchy of the ascending auditory pathway. Comparisons with data from auditory nerve, midbrain, thalamus and cortex reveals that the optimal HSNN predicts several transformations of the ascending auditory pathway including a sequential loss of temporal resolution and synchronization ability, increasing sparseness, and selectivity. The optimal organizational scheme enhances performance by selectively filtering out noise and fast temporal cues such as voicing periodicity, that are not directly relevant to the word recognition task. An identical network arranged to enable high information transfer fails to predict auditory pathway organization and has substantially poorer performance. Furthermore, conventional single-layer linear and nonlinear receptive field networks that capture the overall feature extraction of the HSNN fail to achieve similar performance. The findings suggest that the auditory pathway hierarchy and its sequential nonlinear feature extraction computations enhance relevant cues while removing non-informative sources of noise, thus enhancing the representation of sounds in noise impoverished conditions.

[1]  Neil C. Rabinowitz,et al.  Constructing Noise-Invariant Representations of Sound in the Auditory Pathway , 2013, PLoS biology.

[2]  Almudena Eustaquio-Martín,et al.  Adaptation to Noise in Human Speech Recognition Unrelated to the Medial Olivocochlear Reflex , 2018, The Journal of Neuroscience.

[3]  Thomas P. Trappenberg,et al.  Fundamentals of Computational Neuroscience (2. ed.) , 2002 .

[4]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[5]  R. Shapley,et al.  The receptive field organization of X-cells in the cat: Spatiotemporal coupling and asymmetry , 1984, Vision Research.

[6]  M. Escabí,et al.  Spectral and temporal modulation tradeoff in the inferior colliculus. , 2010, Journal of neurophysiology.

[7]  Stephen V. David,et al.  Mechanisms of noise robust representation of speech in primary auditory cortex , 2014, Proceedings of the National Academy of Sciences.

[8]  P. Joris,et al.  Comparison of bandwidths in the inferior colliculus and the auditory nerve. I. Measurement using a spectrally manipulated stimulus. , 2007, Journal of neurophysiology.

[9]  S. Furukawa,et al.  Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition , 2019, The Journal of Neuroscience.

[10]  A. Zador,et al.  Balanced inhibition underlies tuning and sharpens spike timing in auditory cortex , 2003, Nature.

[11]  Eero P. Simoncelli,et al.  To appear in: The New Cognitive Neurosciences, 3rd edition Editor: M. Gazzaniga. MIT Press, 2004. Characterization of Neural Responses with Stochastic Stimuli , 2022 .

[12]  C. Schreiner,et al.  Nonlinear Spectrotemporal Sound Analysis by Neurons in the Auditory Midbrain , 2002, The Journal of Neuroscience.

[13]  Eunyoung Yi,et al.  Two Modes of Release Shape the Postsynaptic Response at the Inner Hair Cell Ribbon Synapse , 2010, The Journal of Neuroscience.

[14]  A. Reyes,et al.  Synaptic mechanisms underlying auditory processing , 2006, Current Opinion in Neurobiology.

[15]  Nobuo Suga,et al.  Role of corticofugal feedback in hearing , 2008, Journal of Comparative Physiology A.

[16]  Neil C. Rabinowitz,et al.  Contrast Gain Control in Auditory Cortex , 2011, Neuron.

[17]  K. Sen,et al.  Feature analysis of natural sounds in the songbird auditory forebrain. , 2001, Journal of neurophysiology.

[18]  Yonggang Huang,et al.  A high-density, high-channel count, multiplexed μECoG array for auditory-cortex recordings. , 2014, Journal of neurophysiology.

[19]  Philip X. Joris,et al.  In vivo coincidence detection in mammalian sound localization generates phase delays , 2015, Nature Neuroscience.

[20]  Lee M. Miller,et al.  The Contribution of Spike Threshold to Acoustic Feature Selectivity, Spike Information Content, and Information Throughput , 2005, The Journal of Neuroscience.

[21]  Rajiv Narayan,et al.  Distinct time scales in cortical discrimination of natural sounds in songbirds. , 2006, Journal of neurophysiology.

[22]  I. Ohzawa,et al.  Spatiotemporal organization of simple-cell receptive fields in the cat's striate cortex. II. Linearity of temporal and spatial summation. , 1993, Journal of neurophysiology.

[23]  Daniel L. K. Yamins,et al.  A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.

[24]  Lee M. Miller,et al.  Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. , 2002, Journal of neurophysiology.

[25]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[26]  P. Lennie,et al.  The influence of temporal frequency and adaptation level on receptive field organization of retinal ganglion cells in cat , 1982, The Journal of physiology.

[27]  Frédéric E. Theunissen,et al.  The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..

[28]  M. Kilgard,et al.  Cortical activity patterns predict speech discrimination ability , 2008, Nature Neuroscience.

[29]  R. Reid,et al.  Rules of Connectivity between Geniculate Cells and Simple Cells in Cat Primary Visual Cortex , 2001, The Journal of Neuroscience.

[30]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[31]  Monty A. Escabí,et al.  Origins of scale invariance in vocalization sequences and speech , 2018, PLoS Comput. Biol..

[32]  T. Hromádka,et al.  Sparse Representation of Sounds in the Unanesthetized Auditory Cortex , 2008, PLoS biology.

[33]  S. Shamma,et al.  Spectro-temporal modulation transfer functions and speech intelligibility. , 1999, The Journal of the Acoustical Society of America.

[34]  E D Young,et al.  Auditory nerve representation of vowels in background noise. , 1983, Journal of neurophysiology.

[35]  G. DeAngelis,et al.  Spatiotemporal receptive field organization in the lateral geniculate nucleus of cats and kittens. , 1997, Journal of neurophysiology.

[36]  B. C. Motter Central V4 Receptive Fields Are Scaled by the V1 Cortical Magnification and Correspond to a Constant-Sized Sampling of the V1 Surface , 2009, The Journal of Neuroscience.

[37]  Ayla Ergün,et al.  Delayed inhibition in cortical receptive fields and the discrimination of complex stimuli. , 2005, Journal of neurophysiology.

[38]  Chen Chen,et al.  Precise Feature Based Time Scales and Frequency Decorrelation Lead to a Sparse Auditory Code , 2012, The Journal of Neuroscience.

[39]  J. Borst,et al.  Intracellular responses of neurons in the mouse inferior colliculus to sinusoidal amplitude-modulated tones. , 2009, Journal of neurophysiology.

[40]  P. Joris,et al.  Comparison of bandwidths in the inferior colliculus and the auditory nerve. I. Measurement using a spectrally manipulated stimulus. , 2007, Journal of neurophysiology.

[41]  Li I. Zhang,et al.  Tone-evoked excitatory and inhibitory synaptic conductances of primary auditory cortex neurons. , 2004, Journal of neurophysiology.

[42]  C. Schreiner,et al.  Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. , 2003, Journal of neurophysiology.

[43]  S. Furukawa,et al.  Cascaded Processing of Amplitude Modulation for Natural Sound Recognition , 2018, bioRxiv.

[44]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[45]  Lee M. Miller,et al.  Two thalamic pathways to primary auditory cortex , 2008, Neuroscience.

[46]  Monty A Escabí,et al.  Neural Modulation Tuning Characteristics Scale to Efficiently Encode Natural Sound Statistics , 2010, The Journal of Neuroscience.

[47]  A. Reyes,et al.  Spatial Profile of Excitatory and Inhibitory Synaptic Connectivity in Mouse Primary Auditory Cortex , 2012, The Journal of Neuroscience.

[48]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[49]  P. Ulinski Fundamentals of Computational Neuroscience , 2007 .

[50]  M. Escabí,et al.  Neural mechanisms for spectral analysis in the auditory midbrain, thalamus, and cortex. , 2005, International review of neurobiology.

[51]  Steven Greenberg,et al.  Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation , 1999, Speech Commun..

[52]  W Bialek,et al.  On the application of information theory to neural spike trains. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[53]  Joshua X. Gittelman,et al.  Rethinking Tuning: In Vivo Whole-Cell Recordings of the Inferior Colliculus in Awake Bats , 2007, The Journal of Neuroscience.

[54]  Sarah M. N. Woolley,et al.  Sparse and Background-Invariant Coding of Vocalizations in Auditory Scenes , 2013, Neuron.

[55]  Reid R. Clay,et al.  Specificity and strength of retinogeniculate connections. , 1999, Journal of neurophysiology.

[56]  C E Schreiner,et al.  Neural processing of amplitude-modulated sounds. , 2004, Physiological reviews.

[57]  Mounya Elhilali,et al.  A Gestalt inference model for auditory scene segregation , 2019, PLoS Comput. Biol..

[58]  D. Oliver Ascending efferent projections of the superior olivary complex , 2000, Microscopy research and technique.

[59]  E D Young,et al.  Comparative analysis of spectro-temporal receptive fields, reverse correlation functions, and frequency tuning curves of auditory-nerve fibers. , 1994, The Journal of the Acoustical Society of America.

[60]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[61]  J. Winer,et al.  GABAergic feedforward projections from the inferior colliculus to the medial geniculate body. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[62]  W. Loftus,et al.  Organization of binaural excitatory and inhibitory inputs to the inferior colliculus from the superior olive , 2004, The Journal of comparative neurology.