Efficient Average Reward Reinforcement Learning Using Constant Shifting Values
暂无分享,去创建一个
Yang Gao | Bo An | Hao Wang | Shangdong Yang | Xingguo Chen
[1] Ronald Ortner,et al. Adaptive aggregation for reinforcement learning in average reward Markov decision processes , 2013, Ann. Oper. Res..
[2] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.
[3] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[4] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[5] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .
[6] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.
[7] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[8] Yanjie Li,et al. Reinforcement learning algorithms for semi-Markov decision processes with average reward , 2012, Proceedings of 2012 9th IEEE International Conference on Networking, Sensing and Control.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Prasad Tadepalli,et al. Average-Reward Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.
[11] Hoong Chuin Lau,et al. Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs , 2014, AAAI.
[12] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[13] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[14] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[15] Erik D. Demaine,et al. Tetris is hard, even to approximate , 2002, Int. J. Comput. Geom. Appl..
[16] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[17] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[18] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[19] Sridhar Mahadevan,et al. Hierarchical Average Reward Reinforcement Learning , 2007, J. Mach. Learn. Res..
[20] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[21] Abhijit Gosavi,et al. Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..
[22] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..