论文信息 - What is value—accumulated reward or evidence?

What is value—accumulated reward or evidence?

Why are you reading this abstract? In some sense, your answer will cast the exercise as valuable—but what is value? In what follows, we suggest that value is evidence or, more exactly, log Bayesian evidence. This implies that a sufficient explanation for valuable behavior is the accumulation of evidence for internal models of our world. This contrasts with normative models of optimal control and reinforcement learning, which assume the existence of a value function that explains behavior, where (somewhat tautologically) behavior maximizes value. In this paper, we consider an alternative formulation—active inference—that replaces policies in normative models with prior beliefs about the (future) states agents should occupy. This enables optimal behavior to be cast purely in terms of inference: where agents sample their sensorium to maximize the evidence for their generative model of hidden states in the world, and minimize their uncertainty about those states. Crucially, this formulation resolves the tautology inherent in normative models and allows one to consider how prior beliefs are themselves optimized in a hierarchical setting. We illustrate these points by showing that any optimal policy can be specified with prior beliefs in the context of Bayesian inference. We then show how these prior beliefs are themselves prescribed by an imperative to minimize uncertainty. This formulation explains the saccadic eye movements required to read this text and defines the value of the visual sensations you are soliciting.

Karl J. Friston | P. Read Montague | Rick A. Adams | Rick A Adams | P. Montague

[1] K. Cheng. Theory of Superconductivity , 1948, Nature.

[2] H. Haken. Synergetics: an Introduction, Nonequilibrium Phase Transitions and Self-organization in Physics, Chemistry, and Biology , 1977 .

[3] Emanuel Todorov,et al. General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[4] Gerald M Edelman,et al. A cerebellar model for predictive motor control tested in a brain-based device. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[5] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[6] C. Koch,et al. Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[7] R Linsker,et al. Perceptual neural organization: some approaches based on network models and information theory. , 1990, Annual review of neuroscience.

[8] Rajesh P. N. Rao,et al. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[9] Karl J. Friston,et al. Reinforcement Learning or Active Inference? , 2009, PloS one.

[10] Rajesh P. N. Rao,et al. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes , 2010, Front. Comput. Neurosci..

[11] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[12] Karl J. Friston,et al. A free energy principle for the brain , 2006, Journal of Physiology-Paris.

[13] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[14] L. Optican,et al. Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. III. Information theoretic analysis. , 1987, Journal of neurophysiology.

[15] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[16] Matthew Botvinick,et al. Goal-directed decision making in prefrontal cortex: a computational framework , 2008, NIPS.

[17] Hilbert J. Kappen,et al. Graphical Model Inference in Optimal Control of Stochastic Multi-Agent Systems , 2008, J. Artif. Intell. Res..

[18] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[19] Nevin Lianwen Zhang,et al. Probabilistic Inference in Influence Diagrams , 1998, Comput. Intell..

[20] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.

[21] Karl J. Friston,et al. Generalised Filtering , 2010 .

[22] A. Yuille,et al. Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Vision as Bayesian inference: analysis by synthesis? , 2022 .

[23] David J. Field,et al. Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[24] A. Pouget,et al. Variance as a Signature of Neural Computations during Decision Making , 2011, Neuron.

[25] Gerd Gigerenzer,et al. Heuristic decision making. , 2011, Annual review of psychology.

[26] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[27] Kelly Shen,et al. Investigating the role of the superior colliculus in active vision with the visual search paradigm , 2011, The European journal of neuroscience.

[28] A. L. I︠A︡rbus. Eye Movements and Vision , 1967 .

[29] J. Gold,et al. The Influence of Behavioral Context on the Representation of a Perceptual Decision in Developing Oculomotor Commands , 2003, The Journal of Neuroscience.

[30] R. Rescorla,et al. A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[31] L. Brown. A Complete Class Theorem for Statistical Problems with Finite Sample Spaces , 1981 .

[32] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[33] Stefan Schaal,et al. Path integral control and bounded rationality , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[34] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[35] J. Duhamel,et al. The relationship between spatial attention and saccades in the frontoparietal network of the monkey , 2011, The European journal of neuroscience.

[36] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[37] Karl J. Friston,et al. Post hoc Bayesian model selection , 2011, NeuroImage.

[38] H. Haken,et al. Intentionality in non-equilibrium systems? The functional aspects of self-organized pattern formation , 2007 .

[39] Risto Miikkulainen,et al. Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..

[40] R Bellman,et al. On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[41] Marc Toussaint,et al. Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.

[42] Marc Toussaint,et al. Approximate Inference and Stochastic Optimal Control , 2010, ArXiv.

[43] Karl J. Friston,et al. Value-dependent selection in the brain: Simulation in a synthetic neural model , 1994, Neuroscience.

[44] H. B. Barlow,et al. Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[45] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[46] David Mumford,et al. On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[47] Karl J. Friston. What Is Optimal about Motor Control? , 2011, Neuron.

[48] Karl J. Friston. Hierarchical Models in the Brain , 2008, PLoS Comput. Biol..

[49] D Mumford,et al. On the computational architecture of the neocortex. II. The role of cortico-cortical loops. , 1992, Biological cybernetics.

[50] M. Goldberg,et al. Attention, intention, and priority in the parietal lobe. , 2010, Annual review of neuroscience.

[51] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[52] Nando de Freitas,et al. An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward , 2009, AISTATS.

[53] P. Dayan,et al. Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[54] H. Barlow. Inductive Inference, Coding, Perception, and Language , 1974, Perception.

[55] Daniel A. Braun,et al. A Minimum Relative Entropy Principle for Learning and Acting , 2008, J. Artif. Intell. Res..

[56] Jun Tani,et al. A Neurodynamic Account of Spontaneous Behaviour , 2011, PLoS Comput. Biol..

[57] Geoffrey E. Hinton,et al. The Helmholtz Machine , 1995, Neural Computation.

[58] Karl J. Friston,et al. Active inference and agency: optimal control without cost functions , 2012, Biological Cybernetics.

[59] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[60] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[61] G. Birkhoff. Proof of the Ergodic Theorem , 1931, Proceedings of the National Academy of Sciences.

[62] Eirini Mavritsaki,et al. Using biologically plausible neural models to specify the functional and neural mechanisms of visual search. , 2009, Progress in brain research.

[63] Stefan Schaal,et al. A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[64] M. A. Basso,et al. Shedding new light on the role of the basal ganglia-superior colliculus pathway in eye movements , 2010, Current Opinion in Neurobiology.

[65] R. Gregory. Perceptions as hypotheses. , 1980, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[66] R. Klein,et al. Searching for inhibition of return in visual search: A review , 2010, Vision Research.

[67] Karl J. Friston,et al. Perceptions as Hypotheses: Saccades as Experiments , 2012, Front. Psychology.

[68] Heinz Unbehauen,et al. Adaptive Dual Control: Theory and Applications , 2004 .

[69] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[70] Karl J. Friston,et al. Action and behavior: a free-energy formulation , 2010, Biological Cybernetics.

[71] W. Ashby,et al. Principles of the self-organizing dynamic system. , 1947, The Journal of general psychology.

[72] 李幼升,et al. Ph , 1989 .

[73] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[74] Robert H. Wurtz,et al. Thalamic pathways for active vision , 2011, Trends in Cognitive Sciences.

[75] D. J. White,et al. Decision Theory , 2018, Behavioral Finance for Private Banking.

[76] Christian P. Robert,et al. L'analyse statistique bayésienne , 1993 .

[77] Matthew J. Beal. Variational algorithms for approximate Bayesian inference , 2003 .

[78] A. U.S.,et al. Predictability , Complexity , and Learning , 2002 .

[79] A. Noë,et al. A sensorimotor account of vision and visual consciousness. , 2001, The Behavioral and brain sciences.

[80] H. Kappen. Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[81] A. G. Feldman,et al. The origin and use of positional frames of reference in motor control , 1995, Behavioral and Brain Sciences.

[82] S. Grossberg,et al. A Neural Model of Multimodal Adaptive Saccadic Eye Movement Control by Superior Colliculus , 1997, The Journal of Neuroscience.

[83] Karl J. Friston,et al. Frontiers in Neuroinformatics , 2022 .

[84] Karl J. Friston,et al. Cortical circuits for perceptual inference , 2009, Neural Networks.

[85] Leslie G. Ungerleider. Two cortical visual systems , 1982 .

[86] Gregory F. Cooper,et al. A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.

[87] Karl J. Friston,et al. Attention, Uncertainty, and Free-Energy , 2010, Front. Hum. Neurosci..

[88] Ross D. Shachter. Probabilistic Inference and Influence Diagrams , 1988, Oper. Res..

[89] Stefano Nolfi,et al. Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.

[90] J. Henderson,et al. Taking a new look at looking at nothing , 2008, Trends in Cognitive Sciences.

[91] Erkki Oja,et al. Neural Networks, Principal Components, and Subspaces , 1989, Int. J. Neural Syst..

[92] Grégoire Nicolis,et al. Self-Organization in nonequilibrium systems , 1977 .

[93] Karl J. Friston,et al. Predictive coding under the free-energy principle , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[94] E. M.,et al. Statistical Mechanics , 2021, Manual for Theoretical Chemistry.

[95] D. Mackay. Free energy minimisation algorithm for decoding and cryptanalysis , 1995 .

[96] Karl J. Friston,et al. Free Energy, Value, and Attractors , 2011, Comput. Math. Methods Medicine.

[97] J. Gibson. The Ecological Approach to Visual Perception , 1979 .

[98] Jun Tani,et al. Learning to generate articulated behavior through the bottom-up and the top-down interaction processes , 2003, Neural Networks.

[99] A. L. Yarbus,et al. Eye Movements and Vision , 1967, Springer US.

[100] Karl J. Friston. The free-energy principle: a rough guide to the brain? , 2009, Trends in Cognitive Sciences.

[101] Karl J. Friston,et al. Action understanding and active inference , 2011, Biological Cybernetics.

[102] Hermann Haken,et al. Synergetics: An Introduction , 1983 .

[103] Frank Jensen,et al. From Influence Diagrams to junction Trees , 1994, UAI.

[104] M. Berger,et al. High Gamma Power Is Phase-Locked to Theta Oscillations in Human Neocortex , 2006, Science.

[105] Stephen Grossberg,et al. Target Selection by the Frontal Cortex during Coordinated Saccadic and Smooth Pursuit Eye Movements , 2009, Journal of Cognitive Neuroscience.

[106] K. Doya,et al. The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[107] Colin Camerer. Behavioural studies of strategic thinking in games , 2003, Trends in Cognitive Sciences.

[108] C. Zetzsche,et al. Nonlinear and extra-classical receptive field properties and the statistics of natural scenes , 2001, Network.

[109] Pierre Baldi,et al. Bayesian surprise attracts human attention , 2005, Vision Research.

[110] D. Ballard,et al. Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.