Probabilistic inference as a model of planned behavior

The problem of planning and goal-directed behavior has been addressed in computer science for many years, typically based on classical concepts like Bellman’s optimality principle, dynamic programming, or Reinforcement Learning methods – but is this the only way to address the problem? Recently there is growing interest in using probabilistic inference methods for decision making and planning. Promising about such approaches is that they naturally extend to distributed state representations and efficiently cope with uncertainty. In sensor processing, inference methods typically compute a posterior over state conditioned on observations – applied in the context of action selection they compute a posterior over actions conditioned on goals. In this paper we will first introduce the idea of using inference for reasoning about actions on an intuitive level, drawing connections to the idea of internal simulation. We then survey previous and own work using the new approach to address (partially observable) Markov Decision Processes and stochastic optimal control problems.

[1]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[2]  Ross D. Shachter Probabilistic Inference and Influence Diagrams , 1988, Oper. Res..

[3]  Ruedi Stoop,et al.  The Neurodynamics of Belief Propagation on Binary Markov Random Fields , 2006, NIPS.

[4]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[5]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[6]  Matthew Botvinick,et al.  Goal-directed decision making in prefrontal cortex: a computational framework , 2008, NIPS.

[7]  Frank Jensen,et al.  From Influence Diagrams to junction Trees , 1994, UAI.

[8]  E. B. Andersen,et al.  Information Science and Statistics , 1986 .

[9]  Marc Toussaint,et al.  Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.

[10]  T. Yoshikawa,et al.  Task-Priority Based Redundancy Control of Robot Manipulators , 1987 .

[11]  M. Toussaint Lecture Notes: Factor graphs and belief propagation , 2008 .

[12]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[13]  Rajesh P. N. Rao,et al.  Goal-Based Imitation as Probabilistic Inference over Graphical Models , 2005, NIPS.

[14]  Adam Johnson,et al.  Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[15]  T. Raiko,et al.  Learning nonlinear state-space models for control , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[16]  W. R. Shao,et al.  Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis , 2008 .

[17]  Rajesh P. N. Rao,et al.  Bayesian brain : probabilistic approaches to neural coding , 2006 .

[18]  Rick Grush,et al.  The emulation theory of representation: Motor control, imagery, and perception , 2004, Behavioral and Brain Sciences.

[19]  Eric A. Hansen,et al.  Synthesis of Hierarchical Finite-State Controllers for POMDPs , 2003, ICAPS.

[20]  Anders L. Madsen,et al.  Solving linear-quadratic conditional Gaussian influence diagrams , 2005, Int. J. Approx. Reason..

[21]  Svetha Venkatesh,et al.  Policy Recognition in the Abstract Hidden Markov Model , 2002, J. Artif. Intell. Res..

[22]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[25]  Nevin Lianwen Zhang,et al.  Probabilistic Inference in Influence Diagrams , 1998, Comput. Intell..

[26]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[27]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[28]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[29]  Pascal Poupart,et al.  Automated Hierarchy Discovery for Planning in Partially Observable Environments , 2006, NIPS.

[30]  Leslie Pack Kaelbling,et al.  Learning Planning Rules in Noisy Stochastic Worlds , 2005, AAAI.

[31]  Marc Toussaint,et al.  Model-free reinforcement learning as mixture learning , 2009, ICML '09.

[32]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[33]  É. Moulines,et al.  Convergence of a stochastic approximation version of the EM algorithm , 1999 .

[34]  G. Hesslow Conscious thought as simulation of behaviour and perception , 2002, Trends in Cognitive Sciences.

[35]  Marc Toussaint,et al.  Probabilistic inference for solving (PO) MDPs , 2006 .

[36]  Marc Toussaint,et al.  Probabilistic inference for structured planning in robotics , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[38]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[39]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[40]  Gregory F. Cooper,et al.  A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.

[41]  Andrés Cano,et al.  A forward-backward Monte Carlo method for solving influence diagrams , 2006, Int. J. Approx. Reason..

[42]  Marc Toussaint,et al.  Approximate inference for planning in stochastic relational worlds , 2009, ICML '09.