Optimal planning to plan: People partially plan based on plan specificity

Planning requires simulating future choices and consequences. This process is costly. But, it is also useful since it allows people to make choices in the now that have desirable future outcomes. What is a rational way to balance the immediate computational costs and future benefits of planning? Here, we argue that this involves planning to plan—adaptively deciding what actions to plan and when to plan those actions. To formalize this intuition, we develop the ideas of partial planning and information-theoretic simulation costs. Together, these allow us to define a novel Bellman objective that includes both environmental rewards and planning costs, which we solve using a gradient-based planning-to-plan algorithm. A key prediction of our account is that when the value of an immediate action depends on a more specific plan, the computational cost associated with that action will be higher. To test this qualitative prediction, we measure participant response times when solving a Gridworld task. We find evidence for our account of planning costs, indicating that people rationally plan to plan. Our formulation and results provide new insight into the meta-planning processes that support the scale and sophistication of human problem solving.

[1]  Daniel A. Braun,et al.  Bounded Rational Decision-Making from Elementary Computations That Reduce Uncertainty , 2019, Entropy.

[2]  Jessica B. Hamrick,et al.  psiTurk: An open-source framework for conducting replicable behavioral experiments online , 2016, Behavior research methods.

[3]  Thomas L. Griffiths,et al.  Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic , 2015, Top. Cogn. Sci..

[4]  Daniel A. Braun,et al.  Thermodynamics as a theory of decision-making with information-processing costs , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[5]  M. Botvinick,et al.  Planning as inference , 2012, Trends in Cognitive Sciences.

[6]  Naftali Tishby,et al.  Trading Value and Information in MDPs , 2012 .

[7]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[8]  Daniel Polani,et al.  Information Theory of Decisions and Actions , 2011 .

[9]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[10]  Yaacov Trope,et al.  Temporal construal. , 2003, Psychological review.

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[14]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[15]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[16]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[17]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[18]  I. Kaufman,et al.  Cerebral Mechanisms in Behavior. The Hixon Symposium , 1953 .

[19]  K. Lashley The problem of serial order in behavior , 1951 .