Optimistic planning for Markov decision processes
暂无分享,去创建一个
[1] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .
[2] Aleksandra Eric,et al. A Heuristic Search Algorithm for Markov Decision Problems , 1999 .
[3] Frédérick Garcia,et al. On-Line Search for Solving Markov Decision Processes via Heuristic Sampling , 2004, ECAI.
[4] Csaba Szepesvári,et al. Efficient approximate planning in continuous space Markovian Decision Problems , 2001, AI Commun..
[5] Rémi Munos,et al. Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.
[6] Louis Wehenkel,et al. Lazy Planning under Uncertainty by Optimizing Decisions on an Ensemble of Incomplete Disturbance Trees , 2008, EWRL.
[7] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.
[8] Carla Bosia,et al. Supplementary Material S1 , 2011 .
[9] Thomas J. Walsh,et al. Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.
[10] Bart De Schutter,et al. Approximate dynamic programming with a fuzzy parameterization , 2010, Autom..
[11] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[12] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[13] Michael L. Littman,et al. Sample-Based Planning for Continuous Action Markov Decision Processes , 2011, ICAPS.
[14] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[16] Jan M. Maciejowski,et al. Predictive control : with constraints , 2002 .
[17] Bart De Schutter,et al. Optimistic planning for sparsely stochastic systems , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[18] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[19] Rémi Munos,et al. Optimistic Planning of Deterministic Systems , 2008, EWRL.
[20] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[21] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.
[22] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.
[23] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[24] Louis Wehenkel,et al. Planning under uncertainty, ensembles of disturbance trees and kernelized discrete action spaces , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[25] Sebastian Thrun,et al. Planning for Markov Decision Processes with Sparse Stochasticity , 2004, NIPS.
[26] Benjamin Van Roy,et al. Feature-based methods for large scale dynamic programming , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.
[27] Sylvain Gelly,et al. Modifications of UCT and sequence-like simulations for Monte-Carlo Go , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.
[28] Steven M. LaValle,et al. Planning algorithms , 2006 .
[29] John Rust. Numerical dynamic programming in economics , 1996 .
[30] K. Taira. Proof of Theorem 1.3 , 2004 .
[31] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[32] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.