A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
暂无分享,去创建一个
[1] Romain Laroche,et al. Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.
[2] Marcus Hutter,et al. Count-Based Exploration in Feature Space for Reinforcement Learning , 2017, IJCAI.
[3] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[4] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[5] Manfred Huber,et al. Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies , 2003 .
[6] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[7] Matthew Riemer,et al. Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.
[8] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[9] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[10] Mandayam A. L. Thathachar,et al. Local and Global Optimization Algorithms for Generalized Learning Automata , 1995, Neural Computation.
[11] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[12] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[13] Juergen Schmidhuber,et al. On learning how to learn learning strategies , 1994 .
[14] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[15] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[16] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[17] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[18] Andrew G. Barto,et al. Conjugate Markov Decision Processes , 2011, ICML.
[19] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[20] Richard L. Lewis,et al. Optimal rewards in multiagent teams , 2012, 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[21] Romain Laroche,et al. Multi-Advisor Reinforcement Learning , 2017, ArXiv.
[22] Alexander L. Strehl,et al. Probably Approximately Correct (PAC) Exploration in Reinforcement Learning , 2008, ISAIM.
[23] Sergey Levine,et al. Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[24] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[25] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.
[26] Qinmin Yang,et al. Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems using Online Approximators , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[27] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[28] Filip De Turck,et al. Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks , 2016, ArXiv.
[29] Azer Bestavros,et al. Reinforcement Learning for UAV Attitude Control , 2018, ACM Trans. Cyber Phys. Syst..
[30] Manuela M. Veloso,et al. Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.
[31] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[32] Sridhar Mahadevan,et al. Proto-value functions: developmental reinforcement learning , 2005, ICML.
[33] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.