Hierarchical Models in the Brain

This paper describes a general model that subsumes many parametric models for continuous data. The model comprises hidden layers of state-space or dynamic causal models, arranged so that the output of one provides input to another. The ensuing hierarchy furnishes a model for many types of data, of arbitrary complexity. Special cases range from the general linear model for static data to generalised convolution models, with system noise, for nonlinear time-series analysis. Crucially, all of these models can be inverted using exactly the same scheme, namely, dynamic expectation maximization. This means that a single model and optimisation scheme can be used to invert a wide range of models. We present the model and a brief review of its inversion to disclose the relationships among, apparently, diverse generative models of empirical data. We then show that this inversion can be formulated as a simple neural network and may provide a useful metaphor for inference and learning in the brain.

[1]  S. Shipp,et al.  The functional logic of cortical connections , 1988, Nature.

[2]  G. Edelman Neural Darwinism: Selection and reentrant signaling in higher brain function , 1993, Neuron.

[3]  D. Buonomano,et al.  Cortical plasticity: from synapses to maps. , 1998, Annual review of neuroscience.

[4]  Mitsuo Kawato,et al.  A forward-inverse optics model of reciprocal connections between visual cortical areas , 1993 .

[5]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[6]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[7]  Juan M. Restrepo,et al.  A path integral method for data assimilation , 2008 .

[8]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  R. Guillery,et al.  On the actions that one nerve cell can have on another: distinguishing "drivers" from "modulators". , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Gerome Breen,et al.  Behavioral and Brain Functions Dopamine-beta Hydroxylase Polymorphism and Cocaine Addiction , 2022 .

[11]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[12]  Harry Wechsler,et al.  From Statistics to Neural Networks , 1994, NATO ASI Series.

[13]  S. Treue,et al.  Feature-Based Attention Increases the Selectivity of Population Responses in Primate Visual Cortex , 2004, Current Biology.

[14]  R. Näätänen Mismatch negativity: clinical research and possible applications. , 2003, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[15]  R. G. Medhurst,et al.  Topics in the Theory of Random Noise , 1969 .

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  John H. R. Maunsell,et al.  The connections of the middle temporal visual area (MT) and their relationship to a cortical hierarchy in the macaque monkey , 1983, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[18]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[19]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[20]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[21]  D. Mackay Free energy minimisation algorithm for decoding and cryptanalysis , 1995 .

[22]  Karl J. Friston,et al.  A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[23]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[24]  Françoise Lamnabhi-Lagarrigue,et al.  An algebraic approach to nonlinear functional expansions , 1983 .

[25]  Dan Cornford,et al.  Gaussian Process Approximations of Stochastic Differential Equations , 2007, Gaussian Processes in Practice.

[26]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[27]  M. Mesulam,et al.  From sensation to cognition. , 1998, Brain : a journal of neurology.

[28]  W. Singer,et al.  Agonists of cholinergic and noradrenergic receptors facilitate synergistically the induction of long-term potentiation in slices of rat visual cortex , 1992, Brain Research.

[29]  D. Mumford On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[30]  Karl J. Friston,et al.  Value-dependent selection in the brain: Simulation in a synthetic neural model , 1994, Neuroscience.

[31]  Peter Dayan,et al.  Bee foraging in uncertain environments using predictive hebbian learning , 1995, Nature.

[32]  Matthew J. Beal,et al.  The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures , 2003 .

[33]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[34]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[35]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[36]  H. Sørensen Parametric Inference for Diffusion Processes Observed at Discrete Points in Time: a Survey , 2004 .

[37]  W. Schultz Multiple dopamine functions at different time courses. , 2007, Annual review of neuroscience.

[38]  D. Titterington,et al.  Variational Bayesian Inference for Partially Observed Diusions , 2003 .

[39]  Steven J Schiff,et al.  Kalman filter control of a model of spatiotemporal cortical dynamics , 2008, BMC Neuroscience.

[40]  Karl J. Friston,et al.  Predictive coding under the free-energy principle , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[41]  G. Orban,et al.  Laminar distribution of NMDA receptors in cat and monkey visual cortex visualized by [3H]‐MK‐801 binding , 1993, The Journal of comparative neurology.

[42]  H. Kappen An introduction to stochastic control theory, path integrals and reinforcement learning , 2007 .

[43]  David R. Cox,et al.  The Theory of Stochastic Processes , 1967, The Mathematical Gazette.

[44]  E. M.,et al.  Statistical Mechanics , 2021, Manual for Theoretical Chemistry.

[45]  D Mumford,et al.  On the computational architecture of the neocortex. II. The role of cortico-cortical loops. , 1992, Biological cybernetics.

[46]  Karl J. Friston,et al.  DEM: A variational treatment of dynamic systems , 2008, NeuroImage.

[47]  J. DeFelipe,et al.  Microstructure of the neocortex: Comparative aspects , 2002, Journal of neurocytology.

[48]  Karl J. Friston,et al.  Variational free energy and the Laplace approximation , 2007, NeuroImage.

[49]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[50]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[51]  James L. McClelland,et al.  James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.

[52]  Q. Gu,et al.  Neuromodulatory transmitter systems in the cortex and their role in cortical plasticity , 2002, Neuroscience.

[53]  P. Fearnhead,et al.  Exact and computationally efficient likelihood‐based estimation for discretely observed diffusion processes (with discussion) , 2006 .

[54]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[55]  Hong Chen,et al.  Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems , 1995, IEEE Trans. Neural Networks.

[56]  Kuei Y Tseng,et al.  Dopamine–Glutamate Interactions Controlling Prefrontal Cortical Pyramidal Cell Excitability Involve Multiple Signaling Mechanisms , 2004, The Journal of Neuroscience.

[57]  D. Poeppel,et al.  Processing Asymmetry of Transitions between Order and Disorder in Human Auditory Cortex , 2007, The Journal of Neuroscience.

[58]  Karl J. Friston,et al.  Bayesian Estimation of Dynamical Systems: An Application to fMRI , 2002, NeuroImage.

[59]  Karl J. Friston Variational filtering , 2008, NeuroImage.

[60]  B. Efron,et al.  Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .

[61]  R. Kass,et al.  Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models) , 1989 .

[62]  S. Grossberg,et al.  Spikes, synchrony, and attentive learning by laminar thalamocortical circuits , 2006, Brain Research.

[63]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[64]  Praveen K. Pilly,et al.  Temporal dynamics of decision-making during motion perception in the visual cortex , 2008, Vision Research.

[65]  S. J. Martin,et al.  Synaptic plasticity and memory: an evaluation of the hypothesis. , 2000, Annual review of neuroscience.

[66]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[67]  K. Rockland,et al.  Laminar origins and terminations of cortical connections of the occipital lobe in the rhesus monkey , 1979, Brain Research.

[68]  R. Desimone,et al.  Neural mechanisms for visual memory and their role in attention. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[71]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[72]  John J. Foxe,et al.  Determinants and mechanisms of attentional modulation of neural processing. , 2001, Frontiers in Bioscience.

[73]  L. Abbott,et al.  Synaptic Depression and Cortical Gain Control , 1997, Science.

[74]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[75]  T. Shallice,et al.  Neuroimaging evidence for dissociable forms of repetition priming. , 2000, Science.

[76]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[77]  Geoffrey E. Hinton,et al.  Parallel visual computation , 1983, Nature.

[78]  John Duncan,et al.  A neural basis for visual search in inferior temporal cortex , 1993, Nature.

[79]  Karl J. Friston,et al.  Nonlinear PCA: characterizing interactions between modes of brain activity. , 2000, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[80]  J. M. Hupé,et al.  Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons , 1998, Nature.

[81]  G. Evensen,et al.  An ensemble Kalman smoother for nonlinear dynamics , 2000 .

[82]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[83]  Walter J. Freeman A pseudo-equilibrium thermodynamic model of information processing in nonlinear brain dynamics , 2008, Neural Networks.

[84]  J. B. Levitt,et al.  Circuits for Local and Global Signal Integration in Primary Visual Cortex , 2002, The Journal of Neuroscience.

[85]  P. C. Murphy,et al.  Corticofugal feedback influences the generation of length tuning in the visual pathway , 1987, Nature.

[86]  E R John,et al.  Switchboard versus statistical theories of learning and memory. , 1972, Science.

[87]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[88]  John H. R. Maunsell,et al.  Attentional modulation of visual motion processing in cortical areas MT and MST , 1996, Nature.

[89]  C. Koch,et al.  Constraints on cortical and thalamic projections: the no-strong-loops hypothesis , 1998, Nature.

[90]  M. London,et al.  Dendritic computation. , 2005, Annual review of neuroscience.

[91]  Karl J. Friston,et al.  MEG source localization under multiple constraints: An extended Bayesian framework , 2006, NeuroImage.

[92]  Karl J. Friston Learning and inference in the brain , 2003, Neural Networks.