Benchmarking Deep Reinforcement Learning for Continuous Control
暂无分享,去创建一个
Pieter Abbeel | Xi Chen | John Schulman | Yan Duan | Rein Houthooft | J. Schulman | P. Abbeel | Rein Houthooft | Yan Duan | Xi Chen | John Schulman
[1] Bernard Widrow,et al. Pattern Recognition and Adaptive Control , 1964, IEEE Transactions on Applications and Industry.
[2] E. Purcell. Life at Low Reynolds Number , 2008 .
[3] K. Furuta,et al. Computer control of a double inverted pendulum , 1978 .
[4] Seshashayee S. Murthy,et al. 3D balance in legged locomotion: modeling and simulation for the one-legged case (abstract only) , 1984, COMG.
[5] Seshashayee S. Murthy,et al. 3-D balance in legged locomotion: modeling and simulation for the one-legged case , 1986, Workshop on Motion.
[6] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[7] Jessica K. Hodgins,et al. Animation of dynamic legged locomotion , 1991, SIGGRAPH.
[8] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[9] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[10] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .
[11] Mark W. Spong,et al. Swinging up the Acrobot: an example of intelligent control , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[12] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[13] John N. Tsitsiklis,et al. Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[14] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[15] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[16] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[17] Risto Miikkulainen,et al. 2-D Pole Balancing with Recurrent Evolutionary Networks , 1998 .
[18] H. Kimura,et al. Stochastic real-valued reinforcement learning to solve a nonlinear control problem , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).
[19] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[20] R. Rubinstein. The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .
[21] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[22] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[23] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.
[24] Bram Bakker,et al. Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.
[25] Jitendra Malik,et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
[26] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[27] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[28] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .
[29] András Lörincz,et al. MDPs: Learning in Varying Environments , 2003, J. Mach. Learn. Res..
[30] Jeff G. Schneider,et al. Covariant policy search , 2003, IJCAI 2003.
[31] Stefan Schaal,et al. Policy Gradient Methods for Robot Control , 2003 .
[32] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[33] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[34] Nate Kohl,et al. Reinforcement Learning Benchmarks and Bake-offs II A workshop at the 2005 NIPS conference , 2005 .
[35] Peter Stone,et al. Keepaway Soccer: From Machine Learning Testbed to Benchmark , 2005, RoboCup.
[36] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[37] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[38] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[39] Wulfram Gerstner,et al. Dynamical principles for neuroscience and intelligent biomimetic devices , 2006 .
[40] Christos Dimitrakakis,et al. Beliefbox: A framework for statistical methods in sequential decision making , 2007 .
[41] Stefan Schaal,et al. Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.
[42] P. Wawrzynski,et al. Learning to Control a 6-Degree-of-Freedom Walking Robot , 2007, EUROCON 2007 - The International Conference on "Computer as a Tool".
[43] Jürgen Schmidhuber,et al. Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.
[44] Geoffrey Zweig,et al. Automated directory assistance system - from theory to practice , 2007, INTERSPEECH.
[45] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[46] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[47] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[48] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[49] R. Murray,et al. A Case Study in Approximate Linearization: The Acrobot Example , 2010 .
[50] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[51] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[52] Tsukasa Ogasawara,et al. SkyAI: Highly modularized reinforcement learning library , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.
[53] Yuval Tassa,et al. Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts , 2011, Robotics: Science and Systems.
[54] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[55] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[56] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[57] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[58] Steffen Udluft,et al. Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks , 2012, Neural Networks: Tricks of the Trade.
[59] Sergey Levine,et al. Guided Policy Search , 2013, ICML.
[60] Pawel Wawrzynski,et al. dotRL: A platform for rapid Reinforcement Learning methods development and validation , 2013, 2013 Federated Conference on Computer Science and Information Systems.
[61] Donald Michie,et al. BOXES: AN EXPERIMENT IN ADAPTIVE CONTROL , 2013 .
[62] Peter Stone,et al. The Open-Source TEXPLORE Code Release for Reinforcement Learning on Robots , 2013, RoboCup.
[63] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.
[64] Jan Peters,et al. Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..
[65] Christos Dimitrakakis,et al. The reinforcement learning competition , 2014 .
[66] Christos Dimitrakakis,et al. The Reinforcement Learning Competition 2014 , 2014, AI Mag..
[67] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[68] Jan Peters,et al. Learning of Non-Parametric Control Policies with High-Dimensional State Features , 2015, AISTATS.
[69] Ubbo Visser,et al. RLLib: C++ Library to Predict, Control, and Represent Learnable Knowledge Using On/Off Policy Reinforcement Learning , 2015, RoboCup.
[70] Yuval Tassa,et al. Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.
[71] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[72] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.
[73] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.
[74] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[75] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[76] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[77] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[78] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.