暂无分享,去创建一个
Pieter Abbeel | David Held | Markus Wulfmeier | Michael Zhang | Carlos Florensa | P. Abbeel | Markus Wulfmeier | David Held | Carlos Florensa | Michael Zhang
[1] Ira Sheldon Pohl,et al. Bi-directional and heuristic search in path problems , 1969 .
[2] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[3] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[4] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[5] Rajeev Motwani,et al. Path planning in expansive configuration spaces , 1997, Proceedings of International Conference on Robotics and Automation.
[6] Daniel E. Koditschek,et al. Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..
[7] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[8] Steven M. LaValle,et al. RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).
[9] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.
[10] Jeff G. Schneider,et al. Policy Search by Dynamic Programming , 2003, NIPS.
[11] Minoru Asada,et al. Purposive behavior acquisition for a real robot by vision-based reinforcement learning , 1995, Machine Learning.
[12] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[15] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[16] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[17] Ian R. Manchester,et al. LQR-trees: Feedback Motion Planning via Sums-of-Squares Verification , 2010, Int. J. Robotics Res..
[18] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.
[19] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[20] Jürgen Schmidhuber,et al. Continually adding self-invented problems to the repertoire: First experiments with POWERPLAY , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[21] Michiel van de Panne,et al. Curriculum Learning for Motor Skills , 2012, Canadian Conference on AI.
[22] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[23] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..
[24] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..
[25] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[26] Wojciech Zaremba,et al. Learning to Execute , 2014, ArXiv.
[27] Shiguang Shan,et al. Self-Paced Curriculum Learning , 2015, AAAI.
[28] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[29] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[30] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[31] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[32] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[33] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[34] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[35] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[36] Andrea Lockerd Thomaz,et al. Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.
[37] Balaraman Ravindran,et al. Online Multi-Task Learning Using Biased Sampling , 2017 .
[38] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.
[39] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[40] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[41] Yuval Tassa,et al. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.
[42] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.
[43] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.