暂无分享,去创建一个
Sergey Levine | Michael I. Jordan | Pieter Abbeel | Philipp Moritz | John Schulman | J. Schulman | Philipp Moritz | S. Levine | P. Abbeel
[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[3] Nikolaus Hansen,et al. Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.
[4] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[5] Stephen J. Wright,et al. Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .
[6] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[7] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[8] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[9] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[10] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[11] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[12] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[13] D. Hunter,et al. A Tutorial on MM Algorithms , 2004 .
[14] Fred W. Glover,et al. Simulation optimization: a review, new developments, and applications , 2005, Proceedings of the Winter Simulation Conference, 2005..
[15] Florentin Wörgötter,et al. Fast biped walking with a reflexive controller and real-time policy searching , 2005, NIPS.
[16] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[17] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[18] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[19] Arkadi Nemirovski,et al. EFFICIENT METHODS IN CONVEX PROGRAMMING , 2007 .
[20] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[21] K. Wampler,et al. Optimal gait and form for animal locomotion , 2009, SIGGRAPH 2009.
[22] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[23] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[24] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.
[25] V. Climenhaga. Markov chains and mixing times , 2013 .
[26] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[27] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[28] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[29] Daniele Calandriello,et al. Safe Policy Iteration , 2013, ICML.
[30] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[31] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.
[32] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[33] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..