Frontiers in Neuroinformatics

In this paper, we suggest that perception could be modeled by assuming that sensory input is generated by a hierarchy of attractors in a dynamic system. We describe a mathematical model which exploits the temporal structure of rapid sensory dynamics to track the slower trajectories of their underlying causes. This model establishes a proof of concept that slowly changing neuronal states can encode the trajectories of faster sensory signals. We link this hierarchical account to recent developments in the perception of human action; in particular artificial speech recognition. We argue that these hierarchical models of dynamical systems are a plausible starting point to develop robust recognition schemes, because they capture critical temporal dependencies induced by deep hierarchical structure. We conclude by suggesting that a fruitful computational neuroscience approach may emerge from modeling perception as non-autonomous recognition dynamics enslaved by autonomous hierarchical dynamics in the sensorium.

[1]  F. Takens Detecting strange attractors in turbulence , 1981 .

[2]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[3]  C. Browman,et al.  Articulatory Phonology: An Overview , 1992, Phonetica.

[4]  D Mumford,et al.  On the computational architecture of the neocortex. II. The role of cortico-cortical loops. , 1992, Biological cybernetics.

[5]  E.C.L. Vu,et al.  Identification of a forebrain motor programming network for the learned song of zebra finches , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[6]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[7]  A. C. Yu,et al.  Temporal Hierarchical Control of Singing in Birds , 1996, Science.

[8]  M M Sondhi,et al.  The potential role of speech production models in automatic speech recognition. , 1996, The Journal of the Acoustical Society of America.

[9]  Randall D. Beer,et al.  The brain has a body: adaptive behavior emerges from interactions of nervous system, body and environment , 1997, Trends in Neurosciences.

[10]  R. Guillery,et al.  On the actions that one nerve cell can have on another: distinguishing "drivers" from "modulators". , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Mari Ostendorf,et al.  Moving beyond the 'beads-on-a-string' model of speech , 1999 .

[12]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[13]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[14]  A. Liberman,et al.  On the relation of speech to language , 2000, Trends in Cognitive Sciences.

[15]  K. Sen,et al.  Feature analysis of natural sounds in the songbird auditory forebrain. , 2001, Journal of neurophysiology.

[16]  R. Smits Hierarchical categorization of coarticulated phonemes: A theoretical analysis , 2001, Perception & psychophysics.

[17]  K. Kaneko,et al.  How fast elements can affect slow dynamics , 2001, nlin/0108038.

[18]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[19]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[20]  K. Kaneko,et al.  Bifurcation cascade as chaotic itinerancy with multiple time scales. , 2003, Chaos.

[21]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[22]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[23]  Leonard A. Smith,et al.  Indistinguishable states II. The imperfect model scenario , 2004 .

[24]  Towards perceptually realistic talking heads: models, methods and McGurk , 2004, APGV '04.

[25]  D. Mumford On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[26]  J. Fuster Upper processing stages of the perception–action cycle , 2004, Trends in Cognitive Sciences.

[27]  Eric Horvitz,et al.  Layered representations for learning and inferring office activity from multiple sensory channels , 2004, Comput. Vis. Image Underst..

[28]  David Mumford,et al.  On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[29]  Mark S. Nixon,et al.  Automated person recognition by walking and running via model-based approaches , 2004, Pattern Recognit..

[30]  Emanuel Todorov,et al.  From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators , 2005 .

[31]  Emanuel Todorov,et al.  From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators , 2005, J. Field Robotics.

[32]  Toward Perceptually Realistic Talking Heads: Models, Methods, and McGurk , 2005, TAP.

[33]  Trevor Darrell,et al.  Production domain modeling of pronunciation for visual speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[34]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[35]  Michael Breakspear,et al.  Dynamics of a neural system with a multiscale architecture , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[36]  Zhi-Hua Zhou,et al.  Face recognition from a single image per person: A survey , 2006, Pattern Recognit..

[37]  E. Koechlin,et al.  Broca's Area and the Hierarchical Organization of Human Behavior , 2006, Neuron.

[38]  Dong Yu,et al.  Structured speech modeling , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Karl J. Friston,et al.  A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[40]  A. Selverston,et al.  Dynamical principles in neuroscience , 2006 .

[41]  Jeff A. Bilmes,et al.  What HMMs Can Do , 2006, IEICE Trans. Inf. Syst..

[42]  A. Yuille,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[43]  Atsushi Nakamura,et al.  Production-Oriented Models for Speech Recognition , 2006, IEICE Trans. Inf. Syst..

[44]  Christopher D. Manning,et al.  Probabilistic models of language processing and acquisition , 2006, Trends in Cognitive Sciences.

[45]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[46]  Christopher M. Glaze,et al.  Temporal Structure in Zebra Finch Song: Implications for Motor Coding , 2006, The Journal of Neuroscience.

[47]  Ian D. Reid,et al.  A general method for human activity recognition in video , 2006, Comput. Vis. Image Underst..

[48]  Christopher W. Geib,et al.  The meaning of action: a review on action recognition and mapping , 2007, Adv. Robotics.

[49]  Odette Scharenborg,et al.  Reaching over the gap: A review of efforts to link human and automatic speech recognition research , 2007, Speech Commun..

[50]  Li Deng,et al.  Adaptive Kalman Filtering and Smoothing for Tracking Vocal Tract Resonances Using a Continuous-Valued Hidden Dynamic Model , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  A. Budhiraja,et al.  A survey of numerical methods for nonlinear filtering problems , 2007 .

[52]  D. Poeppel,et al.  Speech perception at the interface of neurobiology and linguistics , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[53]  Karl J. Friston,et al.  The mirror-neuron system: a Bayesian perspective. , 2007, Neuroreport.

[54]  Daniel Bullock,et al.  Integrating robotics and neuroscience: brains for robots, bodies for brains , 2007, Adv. Robotics.

[55]  Simon King,et al.  Speech production knowledge in automatic speech recognition. , 2007, The Journal of the Acoustical Society of America.

[56]  Simon King,et al.  Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[57]  Karl J. Friston Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[58]  Roger K. Moore,et al.  Towards an investigation of speech energetics using ‘AnTon’: an animatronic model of a human tongue and vocal tract , 2008, Connect. Sci..

[59]  D. Heeger,et al.  A Hierarchy of Temporal Receptive Windows in Human Cortex , 2008, The Journal of Neuroscience.

[60]  Douglas D. O'Shaughnessy,et al.  Invited paper: Automatic speech recognition: History, methods and challenges , 2008, Pattern Recognit..

[61]  Karl J. Friston,et al.  A Hierarchy of Time-Scales and the Brain , 2008, PLoS Comput. Biol..

[62]  K. R. Weiss,et al.  Predicting Adaptive Behavior in the Environment from Central Nervous System Dynamics , 2008, PloS one.

[63]  Karl J. Friston,et al.  DEM: A variational treatment of dynamic systems , 2008, NeuroImage.

[64]  R. Patterson,et al.  Task-Dependent Modulation of Medial Geniculate Body Is Behaviorally Relevant for Speech Recognition , 2008, Current Biology.

[65]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Karl J. Friston,et al.  ATTRACTORS IN SONG , 2009 .

[67]  J. Rauschecker,et al.  Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing , 2009, Nature Neuroscience.

[68]  Christian R. Huyck,et al.  A psycholinguistic model of natural language parsing implemented in simulated neurons , 2009, Cognitive Neurodynamics.