Model-Free Non-Stationarity Detection and Adaptation in Reinforcement Learning
暂无分享,去创建一个
Marcello Restelli | Manuel Roveri | Giuseppe Canonaco | M. Roveri | Marcello Restelli | Giuseppe Canonaco
[1] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Dit-Yan Yeung,et al. An Environment Model for Nonstationary Reinforcement Learning , 1999, NIPS.
[4] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.
[5] Manuel Roveri,et al. Learning Discrete-Time Markov Chains Under Concept Drift , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[6] Philip S. Thomas,et al. High-Confidence Off-Policy Evaluation , 2015, AAAI.
[7] J. Burbea. The convexity with respect to Gaussian distributions of divergences of order a , 1984 .
[8] Richard S. Sutton,et al. On the role of tracking in stationary environments , 2007, ICML '07.
[9] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[10] Peter Auer,et al. A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions , 2018, ArXiv.
[11] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[12] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.
[15] Ronald Ortner,et al. Variational Regret Bounds for Reinforcement Learning , 2019, UAI.
[16] Emmanuel Hadoux,et al. Sequential Decision-Making under Non-stationary Environments via Sequential Change-point Detection , 2014 .
[17] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[18] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.
[19] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[20] Gregory Ditzler,et al. Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.
[21] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[22] Michèle Basseville,et al. Detection of abrupt changes: theory and application , 1993 .
[23] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.
[24] Paulo Martins Engel,et al. Dealing with non-stationary environments using context detection , 2006, ICML.
[25] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.