Simulation-based Algorithms for Markov Decision Processes/ Hyeong Soo Chang ... [et al.]

Markov decision process (MDP) models are widely used for modeling sequential decision-making problems that arise in engineering, economics, computer science, and the social sciences. Many real-world problems modeled by MDPs have huge state and/or action spaces, giving an opening to the curse of dimensionality and so making practical solution of the resulting models intractable. In other cases, the system of interest is too complex to allow explicit specification of some of the MDP model parameters, but simulation samples are readily available (e.g., for random transitions and costs). For these settings, various sampling and population-based algorithms have been developed to overcome the difficulties of computing an optimal solution in terms of a policy and/or value function. Specific approaches include adaptive sampling, evolutionary policy iteration, evolutionary random policy search, and model reference adaptive search. This substantially enlarged new edition reflects the latest developments in novel algorithms and their underpinning theories, and presents an updated account of the topics that have emerged since the publication of the first edition. Includes: innovative material on MDPs, both in constrained settings and with uncertain transition properties; game-theoretic method for solving MDPs; theories for developing roll-out based algorithms; and details of approximation stochastic annealing, a population-based on-line simulation-based algorithm. The self-contained approach of this book will appeal not only to researchers in MDPs, stochastic modeling, and control, and simulation but will be a valuable source of tuition and reference for students of control and operations research.

[1]  M. Fu,et al.  Optimization of discrete event systems via simultaneous perturbation stochastic approximation , 1997 .

[2]  M. Fu Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis , 1990 .

[3]  Reha Uzsoy,et al.  A review of production planning and scheduling models in the semiconductor industry , 1994 .

[4]  James C. Spall,et al.  Adaptive stochastic approximation by the simultaneous perturbation method , 2000, IEEE Trans. Autom. Control..

[5]  V. Borkar Stochastic approximation with two time scales , 1997 .

[6]  James C. Spall,et al.  A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..

[7]  James C. Spall,et al.  AN OVERVIEW OF THE SIMULTANEOUS PERTURBATION METHOD FOR EFFICIENT OPTIMIZATION , 1998 .

[8]  David D. Yao,et al.  A queueing network model for semiconductor manufacturing , 1996 .

[9]  John N. Tsitsiklis,et al.  Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.

[10]  S. Mitter,et al.  Recursive stochastic algorithms for global optimization in R d , 1991 .

[11]  J. Spall,et al.  Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers , 1999 .

[12]  Michael C. Fu,et al.  Optimal structured feedback policies for ABR flow control using two-timescale SPSA , 2001, TNET.

[13]  S. Marcus,et al.  A Simulation-Based Policy Iteration Algorithm for Average Cost Unichain Markov Decision Processes , 2000 .

[14]  L. Gerencser,et al.  SPSA for non-smooth optimization with application in ECG analysis , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[15]  Dimitri P. Bertsekas,et al.  Missile defense and interceptor allocation by neuro-dynamic programming , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[16]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[17]  A. Ruszczynski,et al.  Stochastic approximation method with gradient averaging for unconstrained problems , 1983 .

[18]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[19]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[20]  Payman Sadegh,et al.  Constrained optimization via stochastic approximation with a simultaneous perturbation gradient approximation , 1997, Autom..

[21]  Xi-Ren Cao,et al.  Perturbation analysis of discrete event dynamic systems , 1991 .

[22]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[23]  Shanling Li,et al.  Dynamic Capacity Expansion Problem with Multiple Products: Technology Selection and Timing of Capacity Additions , 1994, Oper. Res..

[24]  Stuart Bermon,et al.  Capacity analysis of complex manufacturing facilities , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[25]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[26]  Paul Glasserman,et al.  Gradient Estimation Via Perturbation Analysis , 1990 .

[27]  Stochastic approximation for global random optimization , 2000, Proceedings of the 2000 American Control Conference. ACC (IEEE Cat. No.00CH36334).

[28]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[29]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[30]  X. Cao,et al.  Single Sample Path-Based Optimization of Markov Chains , 1999 .

[31]  Vivek S. Borkar,et al.  Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[32]  Vivek S. Borkar,et al.  Multiscale Stochastic Approximation for Parametric Optimization of Hidden Markov Models , 1997, Probability in the Engineering and Informational Sciences.

[33]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[34]  P. Glynn,et al.  Stochastic optimization by simulation: numerical experiments with the M / M /1 queue in steady-state , 1994 .

[35]  Dirk Beyer,et al.  Stochastic Multiproduct Inventory Models with Limited Storage , 2001 .

[36]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[37]  J. Spall Implementation of the simultaneous perturbation algorithm for stochastic optimization , 1998 .

[38]  S. Bhatnagar,et al.  A two timescale stochastic approximation scheme for simulation-based parametric optimization , 1998 .

[39]  Michael C. Fu,et al.  Sample Path Derivatives for (s, S) Inventory Systems , 1994, Oper. Res..

[40]  E. Chong,et al.  A deterministic analysis of stochastic approximation with randomized directions , 1998, IEEE Transactions on Automatic Control.

[41]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[42]  E. Chong,et al.  Optimization of queues using an infinitesimal perturbation analysis-based stochastic algorithm with general update times , 1993 .

[43]  Xi-Ren Cao,et al.  Semi-Markov decision problems and performance sensitivity analysis , 2003, IEEE Trans. Autom. Control..

[44]  Han-Fu Chen,et al.  A Stochastic Approximation Algorithm with Random Differences , 1996 .

[45]  Gang George Yin Rates of Convergence for a Class of Global Stochastic Optimization Algorithms , 1999, SIAM J. Optim..

[46]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[47]  Vivek S. Borkar,et al.  The actor-critic algorithm as multi-time-scale stochastic approximation , 1997 .

[48]  J. Spall,et al.  Model-free control of nonlinear stochastic systems with discrete-time measurements , 1998, IEEE Trans. Autom. Control..

[49]  Harold J. Kushner,et al.  Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[50]  P. Varaiya Convergence of a Stochastic Approximation Algorithm for the GI / G / 1 Queue Using Infinitesimal Perturbation Analysis , 1990 .

[51]  H. Kushner Asymptotic global behavior for stochastic approximation and diffusions with slowly decreasing noise effects: Global minimization via Monte Carlo , 1987 .

[52]  Benjamin Van Roy,et al.  A neuro-dynamic programming approach to retailer inventory management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[53]  J. Dippon,et al.  Weighted Means in Stochastic Approximation of Minima , 1997 .

[54]  Reha Uzsoy,et al.  A REVIEW OF PRODUCTION PLANNING AND SCHEDULING MODELS IN THE SEMICONDUCTOR INDUSTRY PART I: SYSTEM CHARACTERISTICS, PERFORMANCE EVALUATION AND PRODUCTION PLANNING , 1992 .

[55]  László Gerencsér,et al.  Convergence rate of moments in stochastic approximation with simultaneous perturbation gradient approximation and resetting , 1999, IEEE Trans. Autom. Control..

[56]  Patrick M. Fitzpatrick Advanced Calculus: A Course in Mathematical Analysis , 1995 .

[57]  Mark S. Fox,et al.  Intelligent Scheduling , 1998 .

[58]  D. Varberg Convex Functions , 1973 .

[59]  E. Fernandez-Gaucherand,et al.  S/sup 2/YSCODE: stochastic systems control and decision algorithms software laboratory, FORTRAN and MATLAB versions , 1994, Proceedings of IEEE Symposium on Computer-Aided Control Systems Design (CACSD).

[60]  Jian-Qiang Hu,et al.  Conditional Monte Carlo: Gradient Estimation and Optimization Applications , 2012 .

[61]  S. Bhatnagar,et al.  Two-timescale algorithms for simulation optimization of hidden Markov models , 2001 .

[62]  David D. Yao,et al.  Capacity allocation in semiconductor fabrication , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[63]  Vivek S. Borkar,et al.  Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[64]  D. Tirupati,et al.  Technology choice with stochastic demands and dynamic capacity allocation: A two‐product analysis , 1995 .

[65]  Jayashankar M. Swaminathan Tool capacity planning for semiconductor fabrication facilities under demand uncertainty , 2000, Eur. J. Oper. Res..

[66]  John N. Tsitsiklis,et al.  Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[67]  Steven I. Marcus,et al.  Simulation-Based Algorithms for Average Cost Markov Decision Processes , 1999 .

[68]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[69]  Peter Marbach,et al.  Simulation-based optimization of Markov decision processes , 1998 .

[70]  P. Glynn,et al.  Stochastic Optimization by Simulation: Convergence Proofs for the GI/G/1 Queue in Steady-State , 1994 .

[71]  E. Chong,et al.  Stochastic optimization of regenerative systems using infinitesimal perturbation analysis , 1994, IEEE Trans. Autom. Control..

[72]  Shanling Li,et al.  Impact of product mix flexibility and allocation policies on technology , 1997, Comput. Oper. Res..