Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents
暂无分享,去创建一个
[1] Yngvi Björnsson,et al. Simulation-Based Approach to General Game Playing , 2008, AAAI.
[2] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.
[3] P. Bartlett,et al. Stochastic optimization of controlled partially observable Markov decision processes , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).
[4] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[5] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[6] Michael L. Littman,et al. Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.
[7] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[8] David Silver,et al. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .
[9] Claude E. Shannon,et al. Programming a computer for playing chess , 1950 .
[10] Richard L. Lewis,et al. Internal Rewards Mitigate Agent Boundedness , 2010, ICML.