论文信息 - Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration

Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration

We consider the look-ahead control of a conveyor-serviced production station (CSPS) in the context of semi-Markov decision process (SMDP) model, and our goal is to find an optimal control policy under either average- or discounted-cost criteria. Policy iteration (PI), combined with the concept of performance potential, can be applied to provide a unified optimisation framework for both criteria. However, a major difficulty arises in the exact solution scheme, that is, it requires not only the full knowledge of model parameters, but also a considerable amount of work to obtain and process the necessary system and performance matrices. To overcome this difficulty, we propose a potential-based online PI algorithm in this article. During implementation, by analysing and utilising the historic information of all the past operation of a practical CSPS system, the potentials and state-action values are learned on line through an effective exploration scheme. We finally illustrate the successful application of this learning-based technique in CSPS systems by an example.

Tang Hao | Arai Tamio

[1] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[2] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[4] TANGHao,et al. Performance Potential-based Neuro-dynamic Programming for SMDPs , 2005 .

[5] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[6] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[7] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..

[8] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[9] Abhijit Gosavi,et al. Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[10] S. Mahadevan,et al. Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[11] Eginhard J. Muth,et al. Conveyor Theory: A Survey , 1979 .