论文信息 - Probabilistic inference as a model of planned behavior

Probabilistic inference as a model of planned behavior

The problem of planning and goal-directed behavior has been addressed in computer science for many years, typically based on classical concepts like Bellman’s optimality principle, dynamic programming, or Reinforcement Learning methods – but is this the only way to address the problem? Recently there is growing interest in using probabilistic inference methods for decision making and planning. Promising about such approaches is that they naturally extend to distributed state representations and efficiently cope with uncertainty. In sensor processing, inference methods typically compute a posterior over state conditioned on observations – applied in the context of action selection they compute a posterior over actions conditioned on goals. In this paper we will first introduce the idea of using inference for reasoning about actions on an intuitive level, drawing connections to the idea of internal simulation. We then survey previous and own work using the new approach to address (partially observable) Markov Decision Processes and stochastic optimal control problems.

Marc Toussaint | Marc Toussaint

[1] Brendan J. Frey,et al. Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[2] Ross D. Shachter. Probabilistic Inference and Influence Diagrams , 1988, Oper. Res..

[3] Ruedi Stoop,et al. The Neurodynamics of Belief Propagation on Binary Markov Random Fields , 2006, NIPS.

[4] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[5] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[6] Matthew Botvinick,et al. Goal-directed decision making in prefrontal cortex: a computational framework , 2008, NIPS.

[7] Frank Jensen,et al. From Influence Diagrams to junction Trees , 1994, UAI.

[8] E. B. Andersen,et al. Information Science and Statistics , 1986 .

[9] Marc Toussaint,et al. Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.

[10] T. Yoshikawa,et al. Task-Priority Based Redundancy Control of Robot Manipulators , 1987 .

[11] M. Toussaint. Lecture Notes: Factor graphs and belief propagation , 2008 .

[12] Ronald A. Howard,et al. Influence Diagrams , 2005, Decis. Anal..

[13] Rajesh P. N. Rao,et al. Goal-Based Imitation as Probabilistic Inference over Graphical Models , 2005, NIPS.

[14] Adam Johnson,et al. Neural Ensembles in CA3 Transiently Encode Paths Forward of the Animal at a Decision Point , 2007, The Journal of Neuroscience.

[15] T. Raiko,et al. Learning nonlinear state-space models for control , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[16] W. R. Shao,et al. Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis , 2008 .

[17] Rajesh P. N. Rao,et al. Bayesian brain : probabilistic approaches to neural coding , 2006 .

[18] Rick Grush,et al. The emulation theory of representation: Motor control, imagery, and perception , 2004, Behavioral and Brain Sciences.

[19] Eric A. Hansen,et al. Synthesis of Hierarchical Finite-State Controllers for POMDPs , 2003, ICAPS.

[20] Anders L. Madsen,et al. Solving linear-quadratic conditional Gaussian influence diagrams , 2005, Int. J. Approx. Reason..

[21] Svetha Venkatesh,et al. Policy Recognition in the Abstract Hidden Markov Model , 2002, J. Artif. Intell. Res..

[22] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.

[23] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[25] Nevin Lianwen Zhang,et al. Probabilistic Inference in Influence Diagrams , 1998, Comput. Intell..

[26] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[27] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..