Experience Selection in Deep Reinforcement Learning for Control
暂无分享,去创建一个
Robert Babuska | Karl Tuyls | Jens Kober | Tim de Bruin | Robert Babuška | K. Tuyls | J. Kober | T. D. Bruin
[1] Zachary Chase Lipton,et al. Efficient Exploration for Dialogue Policy Learning with BBQ Networks & Replay Buffer Spiking , 2016 .
[2] Gene F. Franklin,et al. Digital control of dynamic systems , 1980 .
[3] Geoffrey E. Hinton,et al. To recognize shapes, first learn to generate images. , 2007, Progress in brain research.
[4] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.
[5] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[6] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[7] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[8] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.
[9] Frank Hutter,et al. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves , 2015, IJCAI.
[10] Robert Babuska,et al. Improved deep reinforcement learning for robotics through distribution-based experience retention , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[11] Sergey Levine,et al. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.
[12] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..
[13] Jeffrey Scott Vitter,et al. Random sampling with a reservoir , 1985, TOMS.
[14] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.
[15] Karl Tuyls,et al. The importance of experience replay database composition in deep reinforcement learning , 2015 .
[16] Peter Stone,et al. Transfer learning for reinforcement learning on a physical robot , 2010, AAMAS 2010.
[17] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[18] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, NIPS.
[19] Robert Babuska,et al. Evaluation of physical damage associated with action selection strategies in reinforcement learning * *I. Koryakovskiy, H. Vallery and R.Babuška were supported by the European project KOROIBOT FP7-ICT-2013-10/611909. , 2017, IFAC-PapersOnLine.
[20] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.
[21] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[22] Shie Mannor,et al. Sequential Decision Making With Coherent Risk , 2017, IEEE Transactions on Automatic Control.
[23] Marcus Hutter,et al. Universal Reinforcement Learning Algorithms: Survey and Experiments , 2017, IJCAI.
[24] Bart De Schutter,et al. Approximate dynamic programming with a fuzzy parameterization , 2010, Autom..
[25] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.
[26] David Andre,et al. Generalized Prioritized Sweeping , 1997, NIPS.
[27] G. Uhlenbeck,et al. On the Theory of the Brownian Motion , 1930 .
[28] Wouter Caarls,et al. Parallel Online Temporal Difference Learning for Motor Control , 2016, IEEE Transactions on Neural Networks and Learning Systems.
[29] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[30] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[31] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.
[32] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[33] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[34] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.
[35] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[36] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[37] Robert Babuska,et al. Policy derivation methods for critic-only reinforcement learning in continuous spaces , 2018, Eng. Appl. Artif. Intell..
[38] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[39] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.
[40] Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.
[41] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .
[42] Jens Kober,et al. Off-policy experience retention for deep actor-critic learning , 2016, NIPS 2016.
[43] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[44] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.
[45] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.
[46] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.
[47] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[48] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[49] Young-Woo Seo,et al. Learning user's preferences by analyzing Web-browsing behaviors , 2000, AGENTS '00.
[50] Frank Hutter,et al. Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.
[51] Robert Babuška,et al. Policy Derivation Methods for Critic-Only Reinforcement Learning in Continuous Action Spaces , 2016 .
[52] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.
[53] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[54] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[55] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.
[56] Richard S. Sutton,et al. Model-Based Reinforcement Learning with an Approximate, Learned Model , 1996 .
[57] Shimon Whiteson,et al. OFFER: Off-Environment Reinforcement Learning , 2017, AAAI.
[58] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.
[59] Bikramjit Banerjee,et al. Performance Bounded Reinforcement Learning in Strategic Interactions , 2004, AAAI.
[60] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[61] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.
[62] Marco Wiering,et al. Q-learning with experience replay in a dynamic environment , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).
[63] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..
[64] Yoav Freund,et al. A Short Introduction to Boosting , 1999 .