暂无分享,去创建一个
[1] Ross D. Shachter,et al. Decision Making Using Probabilistic Inference Methods , 1992, UAI.
[2] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[3] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.
[4] H. Kappen. Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.
[5] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.
[6] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[7] Marc Toussaint,et al. Probabilistic inference for solving (PO) MDPs , 2006 .
[8] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[9] R. Ivry,et al. The coordination of movement: optimal feedback control and beyond , 2010, Trends in Cognitive Sciences.
[10] Sanjoy K. Mitter,et al. A Variational Approach to Nonlinear Estimation , 2003, SIAM J. Control. Optim..
[11] R. Bellman. Dynamic programming. , 1957, Science.
[12] Takamitsu Matsubara,et al. Optimal Feedback Control for anthropomorphic manipulators , 2010, 2010 IEEE International Conference on Robotics and Automation.
[13] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .
[14] Tom Minka,et al. A family of algorithms for approximate Bayesian inference , 2001 .
[15] Hilbert J. Kappen,et al. Dynamic policy programming , 2010, J. Mach. Learn. Res..
[16] Andrew G. Barto,et al. Reinforcement learning , 1998 .
[17] Michael I. Jordan,et al. Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.
[18] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[19] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[20] Gregory F. Cooper,et al. A Method for Using Belief Networks as Influence Diagrams , 2013, UAI 1988.
[21] Ross D. Shachter. Probabilistic Inference and Influence Diagrams , 1988, Oper. Res..
[22] Michael I. Jordan,et al. Reinforcement Learning by Probability Matching , 1995, NIPS 1995.
[23] Marc Toussaint,et al. Integrated motor control, planning, grasping and high-level reasoning in a blocks world using probabilistic inference , 2010, 2010 IEEE International Conference on Robotics and Automation.
[24] Emanuel Todorov,et al. Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.
[25] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[26] Weiwei Li,et al. An Iterative Optimal Control and Estimation Design for Nonlinear Stochastic System , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[27] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[28] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[29] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[30] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.
[31] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[32] D. Barber,et al. Solving deterministic policy ( PO ) MDPs using Expectation-Maximisation and Antifreeze , 2009 .
[33] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[34] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.