Optimal control of multiple CSPS system based on event-based Q learning

The optimal control problem of multiple conveyor-serviced production station (CSPS) system is concerned, and the objective is to maximize the part-processing rate of the entire system by choosing a coordinate look-ahead control strategy for each station. According to the idea of event-based optimization, and by using the concept of performance potentials, an event-based Q-learning algorithm is proposed to solve the coordinated look-ahead control problem with either discounted or average performance criteria. A simulation example is used to illustrate the effectiveness of the proposed algorithm, and the derived results show that the part-processing rate of the entire system is increased significantly compared to that obtained by a Wolf-PHC algorithm.

[1]  Tang Hao,et al.  Look-ahead control of conveyor-serviced production station by using potential-based online policy iteration , 2009, Int. J. Control.

[2]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[3]  Masayuki Matsui,et al.  Strategic selection of assembly systems under viable demands , 2006 .

[4]  Hao Tang,et al.  Coordinated Look-ahead Control of Multiple CSPS System by Multi-agent Reinforcement Learning: Coordinated Look-ahead Control of Multiple CSPS System by Multi-agent Reinforcement Learning , 2010 .

[5]  Masayuki Matsui,et al.  A management design approach to a simple flexible assembly system , 2002 .

[6]  Arnaud Doucet,et al.  A policy gradient method for semi-Markov decision processes with application to call admission control , 2007, Eur. J. Oper. Res..

[7]  Xi-Ren Cao,et al.  Basic Ideas for Event-Based Optimization of Markov Systems , 2005, Discret. Event Dyn. Syst..

[8]  Masayuki Matsui,et al.  Adam—Eve-like genetic algorithm: a methodology for optimal design of a simple flexible assembly system , 1999 .

[9]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[10]  Abhijit Gosavi,et al.  Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..

[11]  Abhijit Gosavi,et al.  Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning , 2003 .

[12]  Xi-Ren Cao,et al.  Event-Based Optimization of Markov Systems , 2008, IEEE Transactions on Automatic Control.

[13]  Masayuki Matsui,et al.  CSPS model: Look-ahead controls and physics , 2005 .