Sample-Based Tree Search with Fixed and Adaptive State Abstractions
暂无分享,去创建一个
[1] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.
[2] Larry S. Davis,et al. Pattern Databases , 1979, Data Base Design Techniques II.
[3] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[4] Alan Fern,et al. UCT for Tactical Assault Planning in Real-Time Strategy Games , 2009, IJCAI.
[5] Dana S. Nau,et al. SHOP2: An HTN Planning System , 2003, J. Artif. Intell. Res..
[6] Benjamin Van Roy. Performance Loss Bounds for Approximate Value Iteration with State Aggregation , 2006, Math. Oper. Res..
[7] Richard E. Korf,et al. Linear-Space Best-First Search , 1993, Artif. Intell..
[8] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[9] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[10] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.
[11] Jesse Hostetler. Monte Carlo Tree Search with Fixed and Adaptive Abstractions , 2017 .
[12] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[13] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.
[14] Borja Calvo,et al. scmamp: Statistical Comparison of Multiple Algorithms in Multiple Problems , 2016, R J..
[15] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[16] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[17] Thomas G. Dietterich,et al. Progressive Abstraction Refinement for Sparse Sampling , 2015, UAI.
[18] Alan Fern,et al. On Adversarial Policy Switching with Experiments in Real-Time Strategy Games , 2013, ICAPS.
[19] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.
[20] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[21] Trevor I. Dix,et al. Proximity-Based Non-uniform Abstractions for Approximate Planning , 2014, J. Artif. Intell. Res..
[22] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[23] Marcus Hutter,et al. Extreme State Aggregation beyond MDPs , 2014, ALT.
[24] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[25] Guy Van den Broeck,et al. Automatic discretization of actions and states in Monte-Carlo tree search , 2011 .
[26] Michael L. Littman,et al. Open-Loop Planning in Large-Scale Stochastic Domains , 2013, AAAI.
[27] Alan Fern,et al. Learning Partial Policies to Speedup MDP Tree Search , 2014, UAI.
[28] Stuart J. Russell,et al. Markovian State and Action Abstractions for MDPs via Hierarchical MCTS , 2016, IJCAI.
[29] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[30] James A. Hendler,et al. HTN Planning: Complexity and Expressivity , 1994, AAAI.
[31] Robert Givan,et al. Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes , 2004, Discret. Event Dyn. Syst..
[32] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.
[33] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[34] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[35] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[36] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .
[37] Nan Jiang,et al. Improving UCT planning via approximate homomorphisms , 2014, AAMAS.
[38] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[39] Kris K. Hauser,et al. Randomized Belief-Space Replanning in Partially-Observable Continuous Spaces , 2010, WAFR.
[40] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[41] S. Edelkamp. Planning with Pattern Databases , 2014 .
[42] Parag Singla,et al. OGA-UCT: On-the-Go Abstractions in UCT , 2016, ICAPS.
[43] Thomas G. Dietterich,et al. State Aggregation in Monte Carlo Tree Search , 2014, AAAI.
[44] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[45] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.
[46] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..
[47] Nicholas Mattei,et al. The Academic Advising Planning Domain , 2012 .
[48] D.A. Castanon,et al. Rollout Algorithms for Stochastic Scheduling Problems , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).
[49] Michael L. Littman,et al. Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes , 2012, ICAPS.
[50] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.
[51] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[52] Geoffrey J. Gordon,et al. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.
[53] Binns,et al. Development of more realistic sailing simulator , 2002 .
[54] Patrik Haslum,et al. Flexible Abstraction Heuristics for Optimal Sequential Planning , 2007, ICAPS.
[55] Thomas Keller,et al. PROST: Probabilistic Planning Based on UCT , 2012, ICAPS.
[56] Parag Singla,et al. ASAP-UCT: Abstraction of State-Action Pairs in UCT , 2015, IJCAI.
[57] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.