The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning
暂无分享,去创建一个
[1] Kelly W. Zhang,et al. Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines , 2022, Algorithms.
[2] SeungYeon Kang,et al. Reinforcement learning-based expanded personalized diabetes treatment recommendation using South Korean electronic health records , 2022, Expert Syst. Appl..
[3] Christopher Grimm,et al. Proper Value Equivalence , 2021, NeurIPS.
[4] Sharad Goel,et al. Bandit algorithms to personalize educational chatbots , 2021, Machine Learning.
[5] Satinder Singh,et al. The Value Equivalence Principle for Model-Based Reinforcement Learning , 2020, NeurIPS.
[6] Mehrab Singh Gill,et al. VacSIM: Learning effective strategies for COVID-19 vaccine distribution using reinforcement learning , 2020, Intelligence-Based Medicine.
[7] Ron Meir,et al. Discount Factor as a Regularizer in Reinforcement Learning , 2020, ICML.
[8] Yao Liu,et al. Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions , 2020, ICML.
[9] Ian Osband,et al. Making Sense of Reinforcement Learning and Probabilistic Inference , 2020, ICLR.
[10] Kristjan H. Greenewald,et al. Personalized HeartSteps , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..
[11] Silviu Pitis,et al. Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach , 2019, AAAI.
[12] Christopher Grimm,et al. Mitigating Planner Overfitting in Model-Based Reinforcement Learning , 2018, ArXiv.
[13] Joelle Pineau,et al. Contextual Bandits for Adapting Treatment in a Mouse Model of de Novo Carcinogenesis , 2018, MLHC.
[14] Emma Brunskill,et al. Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs , 2018, ICML.
[15] Martha White,et al. Unifying Task Specification in Reinforcement Learning , 2016, ICML.
[16] Shie Mannor,et al. Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..
[17] Nan Jiang,et al. The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.
[18] Naoto Yoshida,et al. Reinforcement learning with state-dependent discount factor , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).
[19] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[20] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[21] Johan Pallud,et al. A Tumor Growth Inhibition Model for Low-Grade Glioma Treated with Chemotherapy or Radiotherapy , 2012, Clinical Cancer Research.
[22] Xianping Guo,et al. Markov decision processes with state-dependent discount factors and unbounded rewards/costs , 2011, Oper. Res. Lett..
[23] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..
[24] Richard L. Lewis,et al. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.
[25] Lihong Li,et al. A Bayesian Sampling Approach to Exploration in Reinforcement Learning , 2009, UAI.
[26] Andrew Y. Ng,et al. Near-Bayesian exploration in polynomial time , 2009, ICML '09.
[27] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[28] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[29] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[30] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[31] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.
[32] André Barreto,et al. Approximate Value Equivalence , 2022, NeurIPS.
[33] S. Kakade,et al. Reinforcement Learning: Theory and Algorithms , 2019 .
[34] Maosong Sun,et al. Bandit Learning with Implicit Feedback , 2018, NeurIPS.
[35] Shie Mannor,et al. Bayesian Reinforcement Learning , 2010, Encyclopedia of Machine Learning.
[36] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[37] Peter Stone,et al. Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.