Temporal Regularization in Markov Decision Process
暂无分享,去创建一个
Joelle Pineau | Audrey Durand | Doina Precup | Pierre Thodoroff | Doina Precup | Joelle Pineau | Pierre Thodoroff | A. Durand
[1] Everette S. Gardner,et al. Exponential smoothing: The state of the art , 1985 .
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[4] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[5] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[6] John N. Tsitsiklis,et al. On Average Versus Discounted Reward Temporal-Difference Learning , 2002, Machine Learning.
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[9] E. S. Gardner. EXPONENTIAL SMOOTHING: THE STATE OF THE ART, PART II , 2006 .
[10] Elizabeth L. Wilmer,et al. Markov Chains and Mixing Times , 2008 .
[11] Lihong Li,et al. A worst-case comparison between temporal difference and residual gradient with linear function approximation , 2008, ICML '08.
[12] Shie Mannor,et al. Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems , 2009, 2009 American Control Conference.
[13] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[14] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[15] Marek Petrik,et al. Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.
[16] Joelle Pineau,et al. Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.
[17] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .
[18] Jason Pazis,et al. Non-Parametric Approximate Linear Programming for MDPs , 2011, AAAI.
[19] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[20] Bo Liu,et al. Regularized Off-Policy TD-Learning , 2012, NIPS.
[21] Kai-Min Chung,et al. Chernoff-Hoeffding Bounds for Markov Chains: Generalized and Simplified , 2012, STACS.
[22] Ryan Shaun Joazeiro de Baker,et al. New Potentials for Data-Driven Intelligent Tutoring System Development and Optimization , 2013, AI Mag..
[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[24] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[25] Cosmo Harrigan. Deep Reinforcement Learning with Regularized Convolutional Neural Fitted Q Iteration , 2016 .
[26] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[27] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[28] Peter Stone,et al. Reinforcement learning , 2019, Scholarpedia.
[29] Tom Schaul,et al. Natural Value Approximators: Learning when to Trust Past Estimates , 2017, NIPS.
[30] Barbara E. Engelhardt,et al. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units , 2017, UAI.
[31] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[32] Jianfeng Gao,et al. Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.
[33] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[34] Romain Laroche,et al. In reinforcement learning, all objective functions are not equal , 2018, ICLR.