The Efficiency of Human Cognition Reflects Planned Information Processing

Planning is useful. It lets people take actions that have desirable long-term consequences. But, planning is hard. It requires thinking about consequences, which consumes limited computational and cognitive resources. Thus, people should plan their actions, but they should also be smart about how they deploy resources used for planning their actions. Put another way, people should also "plan their plans". Here, we formulate this aspect of planning as a meta-reasoning problem and formalize it in terms of a recursive Bellman objective that incorporates both task rewards and information-theoretic planning costs. Our account makes quantitative predictions about how people should plan and meta-plan as a function of the overall structure of a task, which we test in two experiments with human participants. We find that people's reaction times reflect a planned use of information processing, consistent with our account. This formulation of planning to plan provides new insight into the function of hierarchical planning, state abstraction, and cognitive control in both humans and machines.

[1]  G Gigerenzer,et al.  Reasoning the fast and frugal way: models of bounded rationality. , 1996, Psychological review.

[2]  Daniel A. Braun,et al.  Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[3]  A. Gordon,et al.  Choosing between movement sequences: A hierarchical editor model. , 1984 .

[4]  Alec Solway,et al.  Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[5]  Jessica B. Hamrick,et al.  psiTurk: An open-source framework for conducting replicable behavioral experiments online , 2016, Behavior research methods.

[6]  Devika Subramanian,et al.  Provably Bounded Optimal Agents , 1993, IJCAI.

[7]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[8]  Allen Newell,et al.  Elements of a theory of human problem solving. , 1958 .

[9]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[10]  Thomas L. Griffiths,et al.  Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic , 2015, Top. Cogn. Sci..

[11]  K. Lashley The problem of serial order in behavior , 1951 .

[12]  R. Bellman Dynamic programming. , 1957, Science.

[13]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[14]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[15]  Eric Joel Hovitz Computation and action under bounded resources , 1991 .

[16]  K. Train Discrete Choice Methods with Simulation , 2003 .

[17]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[18]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[19]  B. Abramson The expected-outcome model of two-player games , 1990 .

[20]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[21]  Yaacov Trope,et al.  Temporal construal. , 2003, Psychological review.

[22]  Leslie Pack Kaelbling,et al.  Hierarchical Planning in the Now , 2010, Bridging the Gap Between Task and Motion Planning.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[27]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[28]  G. Berns,et al.  Intertemporal choice – toward an integrative framework , 2007, Trends in Cognitive Sciences.

[29]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[30]  Alan A. Stocker,et al.  Human Decision-Making under Limited Time , 2016, NIPS.

[31]  Kee-Eung Kim,et al.  Information-Theoretic Bounded Rationality , 2015, ArXiv.

[32]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[33]  M. Botvinick Hierarchical models of behavior and prefrontal function , 2008, Trends in Cognitive Sciences.

[34]  Naftali Tishby,et al.  Trading Value and Information in MDPs , 2012 .

[35]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[36]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[37]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[38]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.