Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration
暂无分享,去创建一个
[1] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[3] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[4] TANGHao,et al. Performance Potential-based Neuro-dynamic Programming for SMDPs , 2005 .
[5] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[6] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[7] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[8] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[9] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .
[10] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .
[11] Eginhard J. Muth,et al. Conveyor Theory: A Survey , 1979 .
[12] Yuan Ji. Performance Potential-based Neuro-dynamic Programming for SMDPs , 2005 .
[13] Robert M. Crisp,et al. A Discrete-Time Queuing Analysis of Conveyor-Serviced Production Stations , 1968, Oper. Res..
[14] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[15] Masayuki Matsui,et al. A Queueing Analysis of Conveyor-Serviced Production Station and the Optimal Range Strategy , 1978 .
[16] W. M. NAWlJN. The analysis of a conveyor-serviced production station , 2003 .
[17] Hongsheng Xi,et al. The optimal robust control policy for uncertain semi-Markov control processes , 2005, Int. J. Syst. Sci..
[18] Masayuki Matsui,et al. CSPS model: Look-ahead controls and physics , 2005 .
[19] Willem M. Nawijn. The Optimal Look-Ahead Policy for Admission to a Single Server System , 1985, Oper. Res..
[20] Abhijit Gosavi,et al. A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis , 2004, Machine Learning.
[21] H.-H. Wang,et al. Successive approximation approach of optimal control for nonlinear discrete-time systems , 2005, Int. J. Syst. Sci..
[22] Hongsheng Xi,et al. Error bounds of optimization algorithms for semi-Markov decision processes , 2007, Int. J. Syst. Sci..
[23] Xi-Ren Cao,et al. Potential-based online policy iteration algorithms for Markov decision processes , 2004, IEEE Transactions on Automatic Control.
[24] Abhijit Gosavi,et al. Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..
[25] Arnaud Doucet,et al. A policy gradient method for semi-Markov decision processes with application to call admission control , 2007, Eur. J. Oper. Res..