Recognizing Sequences of Sequences

The brain's decoding of fast sensory streams is currently impossible to emulate, even approximately, with artificial agents. For example, robust speech recognition is relatively easy for humans but exceptionally difficult for artificial speech recognition systems. In this paper, we propose that recognition can be simplified with an internal model of how sensory input is generated, when formulated in a Bayesian framework. We show that a plausible candidate for an internal or generative model is a hierarchy of ‘stable heteroclinic channels’. This model describes continuous dynamics in the environment as a hierarchy of sequences, where slower sequences cause faster sequences. Under this model, online recognition corresponds to the dynamic decoding of causal sequences, giving a representation of the environment with predictive power on several timescales. We illustrate the ensuing decoding or recognition scheme using synthetic sequences of syllables, where syllables are sequences of phonemes and phonemes are sequences of sound-wave modulations. By presenting anomalous stimuli, we find that the resulting recognition dynamics disclose inference at multiple time scales and are reminiscent of neuronal dynamics seen in the real brain.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  James L. McClelland,et al.  The TRACE model of speech perception , 1986, Cognitive Psychology.

[3]  I. Nelken,et al.  Processing of complex sounds in the auditory system , 2008, Current Opinion in Neurobiology.

[4]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[5]  M. Botvinick Hierarchical models of behavior and prefrontal function , 2008, Trends in Cognitive Sciences.

[6]  Angela D. Friederici,et al.  Phonotactic Knowledge and Lexical-Semantic Processing in One-year-olds: Brain Responses to Words and Nonsense Words in Picture Contexts , 2005, Journal of Cognitive Neuroscience.

[7]  Karl J. Friston,et al.  Modulation of excitatory synaptic coupling facilitates synchronization and complex dynamics in a biophysical model of neuronal dynamics , 2003 .

[8]  C. Petten,et al.  Neural localization of semantic context effects in electromagnetic and hemodynamic studies , 2006, Brain and Language.

[9]  Pablo Varona,et al.  Heteroclinic Contours in Neural Ensembles and the Winnerless Competition Principle , 2004, Int. J. Bifurc. Chaos.

[10]  J. Tani On the Interactions Between Top-Down Anticipation and Bottom-Up Regression , 2007, Frontiers in neurorobotics.

[11]  Dileep George,et al.  How the brain might work: a hierarchical and temporal model for learning and recognition , 2008 .

[12]  Richard Hans Robert Hahnloser,et al.  Neural Mechanisms of Vocal Sequence Generation in the Songbird , 2004, Annals of the New York Academy of Sciences.

[13]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[14]  James L. McClelland,et al.  Learning the structure of event sequences. , 1991, Journal of experimental psychology. General.

[15]  Katharina von Kriegstein,et al.  Encoding of Spectral Correlation over Time in Auditory Cortex , 2008, The Journal of Neuroscience.

[16]  Jun Tani,et al.  Emergence of Functional Hierarchy in a Multiple Timescale Neural Network Model: A Humanoid Robot Experiment , 2008, PLoS Comput. Biol..

[17]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[18]  T. Sticht,et al.  Review of research on the intelligibility and comprehension of accelerated speech. , 1969, Psychological bulletin.

[19]  E. Redcay The superior temporal sulcus performs a common function for social and speech perception: Implications for the emergence of autism , 2008, Neuroscience & Biobehavioral Reviews.

[20]  Dong Yu,et al.  Structured speech modeling , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  K. Sen,et al.  Feature analysis of natural sounds in the songbird auditory forebrain. , 2001, Journal of neurophysiology.

[22]  D. Friedman,et al.  The novelty P3: an event-related brain potential (ERP) sign of the brain's evaluation of novelty , 2001, Neuroscience & Biobehavioral Reviews.

[23]  Nancy Vaughan,et al.  Sequencing versus nonsequencing working memory in understanding of rapid speech by older listeners. , 2006, Journal of the American Academy of Audiology.

[24]  Richard S. J. Frackowiak,et al.  Representation of the temporal envelope of sounds in the human brain. , 2000, Journal of neurophysiology.

[25]  Karl J. Friston Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[26]  E. Koechlin,et al.  Broca's Area and the Hierarchical Organization of Human Behavior , 2006, Neuron.

[27]  Gilles Laurent,et al.  Transient Dynamics for Neural Processing , 2008, Science.

[28]  A. Budhiraja,et al.  A survey of numerical methods for nonlinear filtering problems , 2007 .

[29]  M. Ahissar,et al.  Low-Level Information and High-Level Perception: The Case of Speech in Noise , 2008, PLoS biology.

[30]  A. Friederici Towards a neural basis of auditory sentence processing , 2002, Trends in Cognitive Sciences.

[31]  T. Sejnowski,et al.  A Computational Model of How the Basal Ganglia Produce Sequences , 1998, Journal of Cognitive Neuroscience.

[32]  A. Selverston,et al.  Pacemaker and network mechanisms of rhythm generation: cooperation and competition. , 2008, Journal of theoretical biology.

[33]  A. Boemio,et al.  Hierarchical and asymmetric temporal sensitivity in human auditory cortices , 2005, Nature Neuroscience.

[34]  Karl J. Friston,et al.  DEM: A variational treatment of dynamic systems , 2008, NeuroImage.

[35]  C. Summerfield,et al.  An information theoretical approach to prefrontal executive function , 2007, Trends in Cognitive Sciences.

[36]  Matthew A Wilson,et al.  Firing Rate Dynamics in the Hippocampus Induced by Trajectory Learning , 2008, The Journal of Neuroscience.

[37]  W. Maass,et al.  State-dependent computations: spatiotemporal processing in cortical networks , 2009, Nature Reviews Neuroscience.

[38]  Y. Arshavsky,et al.  Winnerless competition between sensory neurons generates chaos: A possible mechanism for molluscan hunting behavior. , 2002, Chaos.

[39]  Karl J. Friston,et al.  Hierarchical Processing of Auditory Objects in Humans , 2007, PLoS Comput. Biol..

[40]  Jun Tani,et al.  A model for learning to segment temporal sequences, utilizing a mixture of RNN experts together with adaptive variance , 2007, Neural Networks.

[41]  R. Patterson,et al.  Task-Dependent Modulation of Medial Geniculate Body Is Behaviorally Relevant for Speech Recognition , 2008, Current Biology.

[42]  M. Fee,et al.  Using temperature to analyze temporal dynamics in the songbird motor pathway , 2008, Nature.

[43]  Pierre Rainville,et al.  Brain responses to dynamic facial expressions of pain , 2006, Pain.

[44]  R Huerta,et al.  Dynamical encoding by networks of competing neuron groups: winnerless competition. , 2001, Physical review letters.

[45]  D. Bendor,et al.  Neural coding of temporal information in auditory thalamus and cortex , 2008, Neuroscience.

[46]  D. Bouwhuis,et al.  Attention and performance X : control of language processes , 1986 .

[47]  Tomoki Fukai,et al.  A Simple Neural Network Exhibiting Selective Activation of Neuronal Ensembles: From Winner-Take-All to Winners-Share-All , 1997, Neural Computation.

[48]  O Jensen,et al.  Theta/gamma networks with slow NMDA channels learn sequences and encode episodic memory: role of NMDA channels in recall. , 1996, Learning & memory.

[49]  Karl J. Friston,et al.  Modulation of excitatory synaptic coupling facilitates synchronization and complex dynamics in a nonlinear model of neuronal dynamics , 2003, Neurocomputing.

[50]  I. Winkler,et al.  The role of predictive models in the formation of auditory streams , 2006, Journal of Physiology-Paris.

[51]  V. Jayaraman,et al.  Encoding and Decoding of Overlapping Odor Sequences , 2006, Neuron.

[52]  C. Schreiner,et al.  Thalamocortical transformation of responses to complex auditory stimuli , 2004, Experimental Brain Research.

[53]  V. Zhigulin,et al.  On the origin of reproducible sequential activity in neural circuits. , 2004, Chaos.

[54]  Gal Chechik,et al.  Reduction of Information Redundancy in the Ascending Auditory Pathway , 2006, Neuron.

[55]  David Badre,et al.  Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes , 2008, Trends in Cognitive Sciences.

[56]  Leonard A. Smith,et al.  Indistinguishable states II. The imperfect model scenario , 2004 .

[57]  William D. Marslen-Wilson,et al.  Function and process in spoken word recognition: A tutorial review , 1984 .

[58]  Florentin Wörgötter,et al.  Chained learning architectures in a simple closed-loop behavioural context , 2007, Biological Cybernetics.

[59]  Aina Puce,et al.  Common and distinct brain activation to viewing dynamic sequences of face and hand movements , 2007, NeuroImage.

[60]  Karl J. Friston Transients, Metastability, and Neuronal Dynamics , 1997, NeuroImage.

[61]  P. König,et al.  A Model of the Ventral Visual System Based on Temporal Stability and Local Memory , 2006, PLoS biology.

[62]  A. Selverston,et al.  Dynamical principles in neuroscience , 2006 .

[63]  D. Poeppel,et al.  Speech perception at the interface of neurobiology and linguistics , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[64]  D. Norris,et al.  Shortlist B: a Bayesian model of continuous speech recognition. , 2008, Psychological review.

[65]  Karl J. Friston,et al.  The functional anatomy of the MMN: A DCM study of the roving paradigm , 2008, NeuroImage.

[66]  Konrad Paul Kording,et al.  Estimating the sources of motor errors for adaptation and generalization , 2008, Nature Neuroscience.

[67]  A. Nakamura,et al.  Localizing the distributed language network responsible for the N400 measured by MEG during auditory sentence processing , 2006, Brain Research.

[68]  Matthew H. Davis,et al.  Hearing speech sounds: Top-down influences on the interface between audition and speech perception , 2007, Hearing Research.

[69]  Jun Tani,et al.  Learning to generate articulated behavior through the bottom-up and the top-down interaction processes , 2003, Neural Networks.

[70]  Ellen F. Lau,et al.  A cortical network for semantics: (de)constructing the N400 , 2008, Nature Reviews Neuroscience.

[71]  Michael S. Brainard,et al.  Online Contributions of Auditory Feedback to Neural Activity in Avian Song Control Circuitry , 2008, The Journal of Neuroscience.

[72]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[73]  D. Kumaran,et al.  Match–Mismatch Processes Underlie Human Hippocampal Responses to Associative Novelty , 2007, The Journal of Neuroscience.

[74]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[75]  David B. Pisoni,et al.  Long-term memory in speech perception: Some new findings on talker variability, speaking rate and perceptual learning , 1993, Speech Commun..

[76]  J. Csicsvari,et al.  Replay and Time Compression of Recurring Spike Sequences in the Hippocampus , 1999, The Journal of Neuroscience.

[77]  Michael I. Jordan,et al.  An internal model for sensorimotor integration. , 1995, Science.

[78]  Emmanuel Dupoux,et al.  Electrophysiological Correlates of Phonological Processing: A Cross-linguistic Study , 2000, Journal of Cognitive Neuroscience.

[79]  Karl J. Friston,et al.  A Hierarchy of Time-Scales and the Brain , 2008, PLoS Comput. Biol..

[80]  P. Nunez,et al.  On the Relationship of Synaptic Activity to Macroscopic Measurements: Does Co-Registration of EEG with fMRI Make Sense? , 2004, Brain Topography.

[81]  A. Lahiri,et al.  Neurobiological Evidence for Abstract Phonological Representations in the Mental Lexicon during Speech Recognition , 2004, Journal of Cognitive Neuroscience.

[82]  Emanuel Todorov,et al.  From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators , 2005, J. Field Robotics.

[83]  A. Yuille,et al.  Object perception as Bayesian inference. , 2004, Annual review of psychology.

[84]  Konrad Paul Kording,et al.  Bayesian integration in sensorimotor learning , 2004, Nature.

[85]  Matthew H. Davis,et al.  Hierarchical Processing in Spoken Language Comprehension , 2003, The Journal of Neuroscience.

[86]  Ramón Huerta,et al.  Transient Cognitive Dynamics, Metastability, and Decision Making , 2008, PLoS Comput. Biol..

[87]  Werner Hemmert,et al.  Speech encoding in a model of peripheral auditory processing: Quantitative assessment by means of automatic speech recognition , 2007, Speech Commun..

[88]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[89]  Gustavo Deco,et al.  Computational significance of transient dynamics in cortical networks , 2007, The European journal of neuroscience.

[90]  Wouter A Dreschler,et al.  The relationship between the intelligibility of time-compressed speech and speech in noise in young and elderly listeners. , 2002, The Journal of the Acoustical Society of America.

[91]  Ray Meddis,et al.  A revised model of the inner-hair cell and auditory-nerve complex. , 2002, The Journal of the Acoustical Society of America.

[92]  Karl J. Friston,et al.  A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[93]  R. Zatorre,et al.  Structure and function of auditory cortex: music and speech , 2002, Trends in Cognitive Sciences.

[94]  M A Nowak,et al.  An error limit for the evolution of language , 1999, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[95]  Matthew M Botvinick,et al.  Short-term memory for serial order: a recurrent neural network model. , 2006, Psychological review.