论文信息 - Lazy Approximation for Solving Continuous Finite-Horizon MDPs

Lazy Approximation for Solving Continuous Finite-Horizon MDPs

Solving Markov decision processes (MDPs) with continuous state spaces is a challenge due to, among other problems. the well-known curse of dimensionality. Nevertheless, numerous real-world applications such as transportation planning and telescope observation scheduling exhibit a critical dependence on continuous states. Current approaches to continuous-state MDPs include discretizing their transition models. In this paper, we propose and study an alternative, discretization-free approach we call lazy approximation. Empirical study shows that lazy approximation performs much better than discretization, and we successfully applied this new technique to a more realistic planetary rover planning problem.

Lihong Li | Michael L. Littman | M. Littman | Lihong Li

[1] Zhengzhu Feng,et al. Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[2] Satinder Singh,et al. An upper bound on the loss from approximate optimal-value functions , 1994, Machine Learning.

[3] David E. Smith,et al. Planning Under Continuous Time and Resource Uncertainty: A Challenge for AI , 2002, AIPS Workshop on Planning for Temporal Domains.

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] M. Littman,et al. Lazy Approximation : A New Approach for Solving Continuous Finite-Horizon MDPs , 2005 .

[6] Michael L. Littman,et al. Exact Solutions to Time-Dependent MDPs , 2000, NIPS.

[7] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[8] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[9] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[10] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1976, TOMS.

[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[14] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.