论文信息 - Randomized Search Methods for Solving Markov Decision Processes and Global Optimization

Randomized Search Methods for Solving Markov Decision Processes and Global Optimization

Abstract : Markov decision process (MDP) models provide a unified framework for modeling and describing sequential decision making problems that arise in engineering, economics and computer science. However, when the underlying problem is modeled by MDPs there is a typical exponential growth in the size of the resultant MDP model with the size of the original problem, which makes practical solution of the MDP models intractable especially for large problems. Moreover, for complex systems, it is often the case that some of the parameters of the MDP models cannot be obtained in a feasible way, but only simulation samples are available. In the first part of this thesis, we develop two sampling/simulation-based numerical algorithms to address the computational difficulties arising from these settings. The proposed algorithms have somewhat different emphasis one algorithm focuses on MDPs with large state spaces but relatively small action spaces and emphasizes on the efficient allocation of simulation samples to find good value function estimates, whereas the other algorithm targets problems with large action spaces but small state spaces, and invokes a population-based approach to avoid carrying out an optimization over the entire action space. We study the convergence properties of these algorithms and report on computational results to illustrate their performance. The second part of this thesis is devoted to the development of a general framework called Model Reference Adaptive Search (MRAS) for solving global optimization problems. The method iteratively updates a parameterized probability distribution on the solution space, so that the sequence of candidate solutions generated from this distribution will converge asymptotically to the global optimum. We provide a particular instantiation of the framework and establish its convergence properties in both continuous and discrete domains.

Jiaqiao Hu | Jiaqiao Hu

[1] J. MacQueen. A MODIFIED DYNAMIC PROGRAMMING METHOD FOR MARKOVIAN DECISION PROBLEMS , 1966 .

[2] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .

[3] S. Andradóttir. A method for discrete stochastic optimization , 1995 .

[4] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[5] Jon Louis Bentley,et al. Multidimensional Binary Search Trees in Database Applications , 1979, IEEE Transactions on Software Engineering.

[6] Michael C. Fu,et al. A Model Reference Adaptive Search Method for Global Optimization , 2007, Oper. Res..

[7] John Rust,et al. Structural estimation of markov decision processes , 1986 .

[8] Shie Mannor,et al. Action Elimination and Stopping Conditions for Reinforcement Learning , 2003, ICML.

[9] Paul Glasserman,et al. Performance continuity and differentiability in Monte Carlo optimization , 1988, WSC '88.

[10] Michael C. Fu,et al. Evolutionary policy iteration for solving Markov decision processes , 2005, IEEE Transactions on Automatic Control.

[11] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .