论文信息 - Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving

Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving

It has long been recognized that humans (and possibly other animals) usually break problems down into smaller and more manageable problems using subgoals. Despite a general consensus that subgoaling helps problem solving, it is still unclear what the mechanisms guiding online subgoal selection are during the solution of novel problems for which predefined solutions are not available. Under which conditions does subgoaling lead to optimal behaviour? When is subgoaling better than solving a problem from start to finish? Which is the best number and sequence of subgoals to solve a given problem? How are these subgoals selected during online inference? Here, we present a computational account of subgoaling in problem solving. Following Occam's razor, we propose that good subgoals are those that permit planning solutions and controlling behaviour using less information resources, thus yielding parsimony in inference and control. We implement this principle using approximate probabilistic inference: subgoals are selected using a sampling method that considers the descriptive complexity of the resulting sub-problems. We validate the proposed method using a standard reinforcement learning benchmark (four-rooms scenario) and show that the proposed method requires less inferential steps and permits selecting more compact control programs compared to an equivalent procedure without subgoaling. Furthermore, we show that the proposed method offers a mechanistic explanation of the neuronal dynamics found in the prefrontal cortex of monkeys that solve planning problems. Our computational framework provides a novel integrative perspective on subgoaling and its adaptive advantages for planning, control and learning, such as for example lowering cognitive effort and working memory load.

[1] Matthijs A. A. van der Meer,et al. Internally generated sequences in learning and executing goal-directed behavior , 2014, Trends in Cognitive Sciences.

[2] Paul F. M. J. Verschure,et al. The why, what, where, when and how of goal-directed choice: neuronal and computational principles , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3] Aldo Genovesio,et al. Encoding Goals but Not Abstract Magnitude in the Primate Prefrontal Cortex , 2012, Neuron.

[4] Karl J. Friston,et al. Active inference and agency: optimal control without cost functions , 2012, Biological Cybernetics.

[5] Hagai Attias,et al. Planning by Probabilistic Inference , 2003, AISTATS.

[6] G. Micheletti. The Prefrontal Cortex. Anatomy, Physiology and Neuropsychology of the Frontal Lobe, Fuster J.M.. Raven Press, New York (1989) , 1989 .

[7] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[8] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[9] M. Botvinick,et al. Planning as inference , 2012, Trends in Cognitive Sciences.

[10] Rosemary A. Schultz,et al. Performance in Planning: Processes, Requirements, and Errors , 2001 .

[11] Alec Solway,et al. Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.

[12] Karl J. Friston,et al. A Hierarchy of Time-Scales and the Brain , 2008, PLoS Comput. Biol..

[13] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[14] Alec Solway,et al. Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[15] J. Tanji,et al. Activity in the Lateral Prefrontal Cortex Reflects Multiple Steps of Future Events in Action Plans , 2006, Neuron.

[16] Simon J. Godsill,et al. On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[17] Hector Geffner,et al. Computational models of planning. , 2013, Wiley interdisciplinary reviews. Cognitive science.

[18] Allen Newell,et al. Human Problem Solving. , 1973 .

[19] Herman H. Spitz,et al. Subgoal length versus full solution length in predicting Tower of Hanoi problem-solving performance , 1984 .

[20] M. Botvinick. Hierarchical models of behavior and prefrontal function , 2008, Trends in Cognitive Sciences.

[21] E. Miller,et al. An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[22] Dylan A. Simon,et al. Neural Correlates of Forward Planning in a Spatial Decision Task in Humans , 2011, The Journal of Neuroscience.

[23] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[24] J. Tanji,et al. Representation of immediate and final behavioral goals in the monkey prefrontal cortex during an instructed delay period. , 2005, Cerebral cortex.

[25] Christian Balkenius,et al. The principles of goal-directed decision-making: from neural mechanisms to computation and robotics , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[26] Hector Geffner,et al. Width and Serialization of Classical Planning Problems , 2012, ECAI.

[27] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28] Thomas Schmickl,et al. Regulation of task partitioning by a ''common stomach'': a model of nest construction in social wasps , 2011 .

[29] G. Pezzulo,et al. Thinking as the control of imagination: a conceptual framework for goal-directed systems , 2009, Psychological research.

[30] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[31] Thomas Schmickl,et al. Time Delay Implies Cost on Task Switching: A Model to Investigate the Efficiency of Task Partitioning , 2013, Bulletin of mathematical biology.

[32] Chrystopher L. Nehaniv,et al. Hierarchical Behaviours: Getting the Most Bang for Your Bit , 2009, ECAL.

[33] M. Botvinick,et al. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.

[34] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[35] Daniel Polani,et al. Grounding subgoals in information transitions , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[36] J. Boomsma,et al. Task partitioning in insect societies: bucket brigades , 2002, Insectes Sociaux.

[37] Giovanni Pezzulo,et al. The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation , 2013, Front. Psychol..

[38] James A. R. Marshall,et al. Swarm Cognition: an interdisciplinary approach to the study of self-organising biological collectives , 2011, Swarm Intelligence.

[39] Matthew Botvinick,et al. Goal-directed decision making in prefrontal cortex: a computational framework , 2008, NIPS.

[40] G. Pezzulo,et al. The Value of Foresight: How Prospection Affects Decision-Making , 2011, Front. Neurosci..

[41] T. Shallice. Specific impairments of planning. , 1982, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[42] Corso Elvezia,et al. Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability , 1997 .

[43] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[44] Rajesh P. N. Rao,et al. Planning and Acting in Uncertain Environments using Probabilistic Inference , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[45] Karl J. Friston,et al. Reinforcement Learning or Active Inference? , 2009, PloS one.

[46] R. Passingham,et al. The Neurobiology of the Prefrontal Cortex: Anatomy, Evolution, and the Origin of Insight , 2012 .

[47] Roberto Prevete,et al. Programming in the brain: a neural network theoretical framework , 2012, Connect. Sci..

[48] Daniel A. Braun,et al. Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[49] H. B. Barlow,et al. Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[50] Karl J. Friston. The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[51] Rajesh P. N. Rao,et al. Bayesian brain : probabilistic approaches to neural coding , 2006 .

[52] Thomas Schlegel,et al. Stop Signals Provide Cross Inhibition in Collective Decision-making , 2022 .

[53] John E. Laird,et al. The soar papers : research on integrated intelligence , 1993 .

[54] Ray J. Solomonoff,et al. The Discovery of Algorithmic Probability , 1997, J. Comput. Syst. Sci..

[55] Daeyeol Lee,et al. Functional Specialization of the Primate Frontal Cortex during Decision Making , 2007, The Journal of Neuroscience.