Trial-Based Heuristic Tree Search for Finite Horizon MDPs
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Mihalis Yannakakis,et al. Shortest Paths Without a Map , 1989, Theor. Comput. Sci..
[4] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[6] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[7] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[8] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[9] Shlomo Zilberstein,et al. LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..
[10] Blai Bonet,et al. Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.
[11] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.
[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[13] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[14] Blai Bonet,et al. Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and Its Application to MDPs , 2006, ICAPS.
[15] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[16] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[17] Levente Kocsis,et al. Transpositions and move groups in Monte Carlo tree search , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.
[18] Jean Méhat,et al. UCD: Upper Confidence Bound for Rooted Directed Acyclic Graphs , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.
[19] Malte Helmert,et al. High-Quality Policies for the Canadian Traveler's Problem , 2010, SOCS.
[20] P. Schrimpf,et al. Dynamic Programming , 2011 .
[21] Mausam,et al. LRTDP Versus UCT for Online Probabilistic Planning , 2012, AAAI.
[22] Carmel Domshlak,et al. Online Planning in MDPs: Rationality and Optimization , 2012, ArXiv.
[23] Blai Bonet,et al. Action Selection for MDPs: Anytime AO* Versus UCT , 2012, AAAI.
[24] Thomas Keller,et al. PROST: Probabilistic Planning Based on UCT , 2012, ICAPS.
[25] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[26] Peng Dai,et al. Reverse Iterative Deepening for Finite-Horizon MDPs with Large Branching Factors , 2012, ICAPS.