Continual Auxiliary Task Learning
暂无分享,去创建一个
Adam White | Martha White | Raksha Kumaraswamy | Chun-Ping Lo | Matt McLeod | M. Schlegel | Andrew Jacobsen
[1] C. H. Honzik,et al. Degrees of hunger, reward and non-reward, and maze learning in rats, and Introduction and removal of reward, and maze performance in rats , 1930 .
[2] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[3] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[4] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[5] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[6] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Paulo Martins Engel,et al. Improving reinforcement learning with context detection , 2006, AAMAS '06.
[9] Pierre-Yves Oudeyer,et al. Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.
[10] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.
[11] Jürgen Schmidhuber,et al. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.
[12] A. S. Xanthopoulos,et al. Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems , 2008, Appl. Math. Comput..
[13] J. Zico Kolter,et al. The Fixed Points of Off-Policy TD , 2011, NIPS.
[14] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[15] Patrick M. Pilarski,et al. Tuning-free step-size adaptation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Tom Schaul,et al. Better Generalization with Forecasts , 2013, IJCAI.
[17] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.
[18] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[19] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[20] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Bruno Scherrer,et al. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration , 2013, Math. Oper. Res..
[23] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[24] Sherief Abdallah,et al. Addressing Environment Non-Stationarity by Repeating Q-learning Updates , 2016, J. Mach. Learn. Res..
[25] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[26] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[27] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.
[28] Marlos C. Machado,et al. A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.
[29] Richard S. Sutton,et al. Multi-step Off-policy Learning Without Importance Sampling Ratios , 2017, ArXiv.
[30] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.
[31] Shie Mannor,et al. Consistent On-Line Off-Policy Evaluation , 2017, ICML.
[32] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.
[33] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[34] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[35] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[36] Marek Petrik,et al. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity , 2018, J. Artif. Intell. Res..
[37] Martin A. Riedmiller,et al. Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.
[38] Martha White,et al. Online Off-policy Prediction , 2018, ArXiv.
[39] Qiang Liu,et al. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation , 2018, NeurIPS.
[40] Martha White,et al. High-confidence error estimates for learned value functions , 2018, UAI.
[41] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[42] Martha White,et al. Meta-descent for Online, Continual Prediction , 2019, AAAI.
[43] Pierre-Yves Oudeyer,et al. CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.
[44] Alessandro Lazaric,et al. Rotting bandits are no harder than stochastic ones , 2018, AISTATS.
[45] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[46] Francesco Orabona. A Modern Introduction to Online Learning , 2019, ArXiv.
[47] Doina Precup,et al. The Option Keyboard: Combining Skills in Reinforcement Learning , 2021, NeurIPS.
[48] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.
[49] Richard L. Lewis,et al. Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.
[50] Tom Schaul,et al. Universal Successor Features Approximators , 2018, ICLR.
[51] Yao Liu,et al. Off-Policy Policy Gradient with Stationary Distribution Correction , 2019, UAI.
[52] Sergey Levine,et al. Contextual Imagined Goals for Self-Supervised Robotic Learning , 2019, CoRL.
[53] Matthew E. Taylor,et al. Terminal Prediction as an Auxiliary Task for Deep Reinforcement Learning , 2019, AIIDE.
[54] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.
[55] Sergey Levine,et al. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.
[56] Daniel Guo,et al. Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.
[57] Adam White,et al. Adapting Behavior via Intrinsic Reward: A Survey and Empirical Study , 2020, J. Artif. Intell. Res..
[58] Sridhar Mahadevan,et al. Optimizing for the Future in Non-Stationary MDPs , 2020, ICML.
[59] David Simchi-Levi,et al. Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism , 2020, ICML.
[60] Daniel Yamins,et al. Active World Model Learning with Progress Curiosity , 2020, ICML.
[61] Jimmy Ba,et al. Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning , 2020, ICML.
[62] Sergey Levine,et al. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.
[63] Doina Precup,et al. Fast reinforcement learning with generalized policy updates , 2020, Proceedings of the National Academy of Sciences.
[64] Chelsea Finn,et al. Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors , 2020, NeurIPS.
[65] Scott M. Jordan,et al. Towards Safe Policy Improvement for Non-Stationary MDPs , 2020, NeurIPS.
[66] Shalabh Bhatnagar,et al. Reinforcement Learning in Non-Stationary Environments , 2019, ArXiv.
[67] Brendan O'Donoghue,et al. Discovering Diverse Nearly Optimal Policies withSuccessor Features , 2021, ArXiv.
[68] Adam White,et al. A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning , 2021, J. Mach. Learn. Res..
[69] Martha White,et al. General Value Function Networks , 2018, J. Artif. Intell. Res..
[70] Junhyuk Oh,et al. Discovery of Options via Meta-Learned Subgoals , 2021, NeurIPS.