论文信息 - Approximate Inference and Stochastic Optimal Control

Approximate Inference and Stochastic Optimal Control

We propose a novel reformulation of the stochastic optimal control problem as an approximate inference problem, demonstrating, that such a interpretation leads to new practical methods for the original problem. In particular we characterise a novel class of iterative solutions to the stochastic optimal control problem based on a natural relaxation of the exact dual formulation. These theoretical insights are applied to the Reinforcement Learning problem where they lead to new model free, o policy methods for discrete and continuous problems.

[1] Ross D. Shachter,et al. Decision Making Using Probabilistic Inference Methods , 1992, UAI.

[2] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[3] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.

[4] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[5] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.

[6] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.

[7] Marc Toussaint,et al. Probabilistic inference for solving (PO) MDPs , 2006 .

[8] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[9] R. Ivry,et al. The coordination of movement: optimal feedback control and beyond , 2010, Trends in Cognitive Sciences.

[10] Sanjoy K. Mitter,et al. A Variational Approach to Nonlinear Estimation , 2003, SIAM J. Control. Optim..

[11] R. Bellman. Dynamic programming. , 1957, Science.

[12] Takamitsu Matsubara,et al. Optimal Feedback Control for anthropomorphic manipulators , 2010, 2010 IEEE International Conference on Robotics and Automation.

[13] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .

[14] Tom Minka,et al. A family of algorithms for approximate Bayesian inference , 2001 .

[15] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..

[16] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[17] Michael I. Jordan,et al. Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[18] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[19] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[20] Gregory F. Cooper,et al. A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.

[21] Ross D. Shachter. Probabilistic Inference and Influence Diagrams , 1988, Oper. Res..

[22] Michael I. Jordan,et al. Reinforcement Learning by Probability Matching , 1995, NIPS 1995.

[23] Marc Toussaint,et al. Integrated motor control, planning, grasping and high-level reasoning in a blocks world using probabilistic inference , 2010, 2010 IEEE International Conference on Robotics and Automation.

[24] Emanuel Todorov,et al. Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[25] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[26] Weiwei Li,et al. An Iterative Optimal Control and Estimation Design for Nonlinear Stochastic System , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[27] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[28] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[29] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[30] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.

[31] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[32] D. Barber,et al. Solving deterministic policy ( PO ) MDPs using Expectation-Maximisation and Antifreeze , 2009 .

[33] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[34] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.