Improved QMDP Policy for Partially Observable Markov Decision Processes in Large Domains: Embedding Exploration Dynamics
暂无分享,去创建一个
[1] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[2] Stephen S. Lee,et al. Planning with Partially Observable Markov Decision Processes: Advances in Exact Solution Method , 1998, UAI.
[3] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[4] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[5] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[6] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[7] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[8] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[9] Leslie Pack Kaelbling,et al. Planning With Deadlines in Stochastic Domains , 1993, AAAI.
[10] N. Zhang,et al. Algorithms for partially observable markov decision processes , 2001 .
[11] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[12] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[13] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[14] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[15] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[16] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[17] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.
[18] R. A. McCallum. First Results with Utile Distinction Memory for Reinforcement Learning , 1992 .
[19] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.
[20] Spyros G. Tzafestas,et al. Fuzzy reinforcement learning control for compliance tasks of robotic manipulators , 2002, IEEE Trans. Syst. Man Cybern. Part B.
[21] Wenju Liu,et al. A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains , 1997, J. Artif. Intell. Res..
[22] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .