Approximate Newton Methods for Policy Search in Markov Decision Processes
暂无分享,去创建一个
[1] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[2] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .
[3] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[4] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.
[5] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[7] Peter W. Glynn,et al. Proceedings of Ihe 1986 Winter Simulation , 2022 .
[8] Robot modelling and control , 1990 .
[9] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[10] Shun-ichi Amari,et al. Information geometry of Boltzmann machines , 1992, IEEE Trans. Neural Networks.
[11] Steven Douglas Whitehead,et al. Reinforcement learning for the adaptive control of perception and action , 1992 .
[12] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[13] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[14] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .
[15] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[16] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[17] Andrzej Cichocki,et al. A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.
[18] Michael I. Jordan,et al. Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..
[19] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[20] Shun-ichi Amari,et al. Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.
[21] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[22] Geoffrey E. Hinton,et al. Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.
[23] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[24] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[25] J. Spall,et al. Model-free control of nonlinear stochastic systems with discrete-time measurements , 1998, IEEE Trans. Autom. Control..
[26] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[27] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[28] J. Tsitsiklis,et al. Actor-citic agorithms , 1999, NIPS 1999.
[29] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[30] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[31] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[32] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..
[33] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[34] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[35] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[36] Jun Nakanishi,et al. Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.
[37] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[38] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[39] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[40] Nicole A. Lazar,et al. Statistical Analysis With Missing Data , 2003, Technometrics.
[41] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[42] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.
[43] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.
[44] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[45] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[46] Nicol N. Schraudolph,et al. Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation , 2005, NIPS.
[47] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .
[48] Peter Sollich,et al. Theory of Neural Information Processing Systems , 2005 .
[49] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[50] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.
[51] Weiwei Li. Optimal control for biological movement systems , 2006 .
[52] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[53] Mohammad Ghavamzadeh,et al. Bayesian Policy Gradient Algorithms , 2006, NIPS.
[54] Kevin Warwick,et al. Maintain order even on the hop - review of "Robot modelling and control" by M. Spong, S. Hutchinson and M. Vidyasagar , 2006 .
[55] Weiwei Li,et al. An Iterative Optimal Control and Estimation Design for Nonlinear Stochastic System , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[56] Jin Yu,et al. Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.
[57] Marc Toussaint,et al. Probabilistic inference for solving (PO) MDPs , 2006 .
[58] Dipti Srinivasan,et al. Neural Networks for Real-Time Traffic Signal Control , 2006, IEEE Transactions on Intelligent Transportation Systems.
[59] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[60] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[61] Stefan Schaal,et al. Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.
[62] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[63] Junichiro Yoshimoto,et al. A New Natural Policy Gradient by Stationary Distribution Metric , 2008, ECML/PKDD.
[64] David Silver,et al. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .
[65] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[66] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.
[67] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[68] D. Barber,et al. Solving deterministic policy ( PO ) MDPs using Expectation-Maximisation and Antifreeze , 2009 .
[69] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.
[70] Nando de Freitas,et al. An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward , 2009, AISTATS.
[71] Yuval Tassa,et al. Iterative local dynamic programming , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[72] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[73] David Barber,et al. Variational methods for Reinforcement Learning , 2010, AISTATS.
[74] Marc Toussaint,et al. Bayesian Time Series Models: Expectation maximisation methods for solving (PO)MDPs and optimal control problems , 2011 .
[75] TaeChoong Chung,et al. Hessian matrix distribution for Bayesian policy gradient reinforcement learning , 2011, Inf. Sci..
[76] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[77] D. Bertsekas. Approximate policy iteration: a survey and some new methods , 2011 .
[78] Marc Toussaint,et al. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.
[79] David Barber,et al. Bayesian reasoning and machine learning , 2012 .
[80] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[81] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[82] Thomas Furmston,et al. Applications of probabilistic inference to planning & reinforcement learning , 2013 .
[83] P. Olver. Nonlinear Systems , 2013 .
[84] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.
[85] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[86] Philip Thomas,et al. GeNGA: A Generalization of Natural Gradient Ascent with Positive and Negative Convergence Results , 2014, ICML.
[87] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[88] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[89] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[90] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[91] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[92] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.