Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning
暂无分享,去创建一个
Vihang P. Patil | S. Hochreiter | Hamid Eghbal-zadeh | Marius-Constantin Dinu | Fabian Paischer | Angela Bitto-Nemling | C. Steinparz | Thomas Schmied | Vihang Patil
[1] S. Hochreiter,et al. History Compression via Language Models in Reinforcement Learning , 2022, ICML.
[2] M. Mitchell,et al. Abstraction for Deep Reinforcement Learning , 2022, IJCAI.
[3] Marcello Restelli,et al. Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization , 2021, AAAI.
[4] Satinder Singh,et al. Bootstrapped Meta-Learning , 2021, ICLR.
[5] Stephen J. Roberts,et al. Same State, Different Task: Continual Reinforcement Learning without Interference , 2021, AAAI.
[6] Razvan Pascanu,et al. Continual World: A Robotic Benchmark For Continual Reinforcement Learning , 2021, NeurIPS.
[7] Ana L. C. Bazzan,et al. Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection , 2021, AAMAS.
[8] Doina Precup,et al. Towards Continual Reinforcement Learning: A Review and Perspectives , 2020, J. Artif. Intell. Res..
[9] Sepp Hochreiter,et al. Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER , 2020, Trans. Large Scale Data Knowl. Centered Syst..
[10] David P. Kreil,et al. Cross-Domain Few-Shot Learning by Representation Fusion , 2020, ArXiv.
[11] Jose A. Arjona-Medina,et al. Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution , 2020, ICML.
[12] Gunshi Gupta,et al. La-MAML: Look-ahead Meta Learning for Continual Learning , 2020, NeurIPS.
[13] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[14] Ali Farhadi,et al. Supermasks in Superposition , 2020, NeurIPS.
[15] David Simchi-Levi,et al. Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism , 2020, ICML.
[16] Benjamin F. Grewe,et al. Continual Learning in Recurrent Neural Networks with Hypernetworks , 2020, ArXiv.
[17] Zhi-Hua Zhou,et al. A Simple Approach for Non-stationary Linear Bandits , 2020, AISTATS.
[18] Sridhar Mahadevan,et al. Optimizing for the Future in Non-Stationary MDPs , 2020, ICML.
[19] Tim Rocktäschel,et al. RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments , 2020, ICLR.
[20] Tom Mitchell,et al. Jelly Bean World: A Testbed for Never-Ending Learning , 2020, ICLR.
[21] Daniel Guo,et al. Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.
[22] O. Cappé,et al. Weighted Linear Bandits for Non-Stationary Environments , 2019, NeurIPS.
[23] Hao Wang,et al. Forward and Backward Knowledge Transfer for Sentiment Classification , 2019, ACML.
[24] Martha White,et al. Meta-Learning Representations for Continual Learning , 2019, NeurIPS.
[25] Erwan Lecarpentier,et al. Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning , 2019, NeurIPS.
[26] Philip S. Thomas,et al. A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning , 2019, AAMAS.
[27] Kenneth O. Stanley,et al. Go-Explore: a New Approach for Hard-Exploration Problems , 2019, ArXiv.
[28] Nitin Singh,et al. Change point detection for compositional multivariate data , 2019, Applied Intelligence.
[29] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.
[30] Nathan D. Cahill,et al. Memory Efficient Experience Replay for Streaming Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).
[31] Marlos C. Machado,et al. Count-Based Exploration with the Successor Representation , 2018, AAAI.
[32] Davide Maltoni,et al. Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.
[33] Sepp Hochreiter,et al. RUDDER: Return Decomposition for Delayed Rewards , 2018, NeurIPS.
[34] Stefan Wermter,et al. Lifelong Learning of Spatiotemporal Representations With Dual-Memory Recurrent Self-Organization , 2018, Front. Neurorobot..
[35] Qingyun Wu,et al. Learning Contextual Bandits in a Non-stationary Environment , 2018, SIGIR.
[36] Zhanxing Zhu,et al. Reinforced Continual Learning , 2018, NeurIPS.
[37] Elad Hoffer,et al. Task Agnostic Continual Learning Using Online Variational Bayes , 2018, 1803.10123.
[38] Zheng Wen,et al. Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit , 2018, AISTATS.
[39] Alexandros Karatzoglou,et al. Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .
[40] Svetlana Lazebnik,et al. PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[41] Sheetal Kalyani,et al. Taming Non-stationary Bandits: A Bayesian Approach , 2017, ArXiv.
[42] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[43] Marcus Hutter,et al. Count-Based Exploration in Feature Space for Reinforcement Learning , 2017, IJCAI.
[44] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[45] Byoung-Tak Zhang,et al. Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.
[46] Surya Ganguli,et al. Continual Learning Through Synaptic Intelligence , 2017, ICML.
[47] S. Shankar Sastry,et al. Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.
[48] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.
[49] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.
[50] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.
[51] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[52] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.
[53] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[54] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[55] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[56] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.
[57] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[58] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.
[59] Naoyuki Kubota,et al. Reinforcement Learning in non-stationary environments: An intrinsically motivated stress based memory retrieval performance (SBMRP) model , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).
[60] Omar Besbes,et al. Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-Stationary Rewards , 2014, Stochastic Systems.
[61] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[62] Martial Mermillod,et al. The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects , 2013, Front. Psychol..
[63] David Whitney,et al. Motion-Dependent Representation of Space in Area MT+ , 2013, Neuron.
[64] Xinlei Chen,et al. Never-Ending Learning , 2012, ECAI.
[65] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[66] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.
[67] Michael L. Littman,et al. An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..
[68] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.
[69] Ryan P. Adams,et al. Bayesian Online Changepoint Detection , 2007, 0710.3742.
[70] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.
[71] Shai Ben-David,et al. Detecting Change in Data Streams , 2004, VLDB.
[72] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[73] R. French. Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.
[74] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[75] Sebastian Thrun,et al. Lifelong Learning Algorithms , 1998, Learning to Learn.
[76] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[77] Rich Caruana,et al. Multitask Learning , 1997, Machine Learning.
[78] Gerhard Widmer,et al. Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.
[79] David A. Cohn,et al. Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.
[80] Sebastian Thrun,et al. Lifelong robot learning , 1993, Robotics Auton. Syst..
[81] Michèle Basseville,et al. Detection of abrupt changes: theory and application , 1993 .
[82] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[83] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[84] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[85] S. Grossberg,et al. ART 2: self-organization of stable category recognition codes for analog input patterns. , 1987, Applied optics.
[86] J. Forrester. Counterintuitive behavior of social systems , 1971 .
[87] J. Knott. The organization of behavior: A neuropsychological theory , 1951 .
[88] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .
[89] K. Keutzer,et al. NovelD: A Simple yet Effective Exploration Criterion , 2021, NeurIPS.
[90] Amr Ahmed,et al. Non-Stationary Off-Policy Optimization , 2021, AISTATS.
[91] Daniel Hennes,et al. Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients , 2020, AAMAS.
[92] Marcello Restelli,et al. Model-Free Non-Stationarity Detection and Adaptation in Reinforcement Learning , 2020, ECAI.
[93] Jose A. Arjona-Medina,et al. XAI and Strategy Extraction via Reward Redistribution , 2020, xxAI@ICML.
[94] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .
[95] Michèle Sebag,et al. Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments , 2007 .
[96] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[97] Manfred Huber,et al. Subgoal Discovery for Hierarchical Reinforcement Learning Using Learned Policies , 2003 .
[98] S. Hochreiter,et al. REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .
[99] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[100] Michèle Basseville,et al. Detection of abrupt changes , 1993 .
[101] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[102] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .