Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning

We address an unrelated parallel machine scheduling problem with R-learning, an average-reward reinforcement learning (RL) method. Different types of jobs dynamically arrive in independent Poisson processes. Thus the arrival time and the due date of each job are stochastic. We convert the scheduling problems into RL problems by constructing elaborate state features, actions, and the reward function. The state features and actions are defined fully utilizing prior domain knowledge. Minimizing the reward per decision time step is equivalent to minimizing the schedule objective, i.e. mean weighted tardiness. We apply an on-line R-learning algorithm with function approximation to solve the RL problems. Computational experiments demonstrate that R-learning learns an optimal or near-optimal policy in a dynamic environment from experience and outperforms four effective heuristic priority rules (i.e. WSPT, WMDD, ATC and WCOVERT) in all test problems.

[1]  Thomas E. Morton,et al.  Myopic Heuristics for the Single Machine Weighted Tardiness Problem , 1982 .

[2]  Vittaldas V. Prabhu,et al.  Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems , 2004, Applied Intelligence.

[3]  Wooseung Jang,et al.  Scheduling unrelated parallel machines to minimize total weighted tardiness , 2006, SOLI 2006.

[4]  Funda Sivrikaya-Serifoglu,et al.  Parallel machine scheduling with earliness and tardiness penalties , 1999, Comput. Oper. Res..

[5]  Sung-Shick Kim,et al.  A due date density-based categorising heuristic for parallel machines scheduling , 2003 .

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Xiaoming Li,et al.  A Weighted Modified Due Date Rule for Sequencing to Minimize Weighted Tardiness , 2004, J. Sched..

[8]  Chung-Lun Li,et al.  Approximation algorithms for common due date assignment and job scheduling on parallel machines , 2002 .

[9]  Jae Kyung Shim,et al.  Scheduling Jobs on Parallel Machines with Sequence-Dependent Family Set-up Times , 2002 .

[10]  F. Frank Chen,et al.  Unrelated parallel machine scheduling with setup times and a total weighted tardiness objective , 2003 .

[11]  Jwm Will Bertrand,et al.  A dynamic priority rule for scheduling against due-dates , 1982 .

[12]  Mitsuo Gen,et al.  Minmax earliness/tardiness scheduling in identical parallel machine system using genetic algorithms , 1995 .

[13]  Saeid Nahavandi,et al.  The application of a reinforcement learning agent to a multi-product manufacturing facility , 2002, 2002 IEEE International Conference on Industrial Technology, 2002. IEEE ICIT '02..

[14]  Yang Yi,et al.  Soft computing for scheduling with batch setup times and earliness-tardiness penalties on parallel machines , 2003, J. Intell. Manuf..

[15]  Botond Kádár,et al.  Reinforcement learning in a distributed market-based production control system , 2006, Adv. Eng. Informatics.

[16]  Botond Kádár,et al.  Improving Multi-agent Based Scheduling by Neurodynamic Programming , 2003, HoloMAS.

[17]  Hisashi Tamaki,et al.  A heuristic-based hybrid solution for parallel machine scheduling problems with earliness and tardiness penalties , 2003, EFTA 2003. 2003 IEEE Conference on Emerging Technologies and Factory Automation. Proceedings (Cat. No.03TH8696).

[18]  Abhijit Gosavi,et al.  Reinforcement learning for long-run average cost , 2004, Eur. J. Oper. Res..

[19]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[20]  A. Volgenant,et al.  Improved heuristics for the n-job single-machine weighted tardiness problem , 1999, Comput. Oper. Res..

[21]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[22]  Mikhail Y. Kovalyov,et al.  Approximation Schemes for Scheduling Jobs with Common Due Date on Parallel Machines to Minimize Total Tardiness , 2002, J. Heuristics.

[23]  Wilfried Brauer,et al.  Multi-machine scheduling-a multi-agent learning approach , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[24]  John Lu,et al.  Unrelated parallel machine scheduling with setup consideration and a total weighted completion time objective , 2001 .

[25]  Bahram Alidaee,et al.  Scheduling parallel machines to minimize total weighted and unweighted tardiness , 1997, Comput. Oper. Res..

[26]  Guoqing Wang,et al.  Parallel machine earliness and tardiness scheduling with proportional weights , 2003, Comput. Oper. Res..

[27]  SRIDHAR MAHADEVAN,et al.  Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[28]  Prasad Tadepalli,et al.  Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..

[29]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[30]  Zhicong Zhang Reinforcement learning based scheduling in semiconductor final testing , 2010, 2010 IEEE International Conference on Industrial Engineering and Engineering Management.

[31]  Mehmet Emin Aydin,et al.  Dynamic job-shop scheduling using reinforcement learning agents , 2000, Robotics Auton. Syst..

[32]  Ari P. J. Vepsalainen Priority rules for job shops with weighted tardiness costs , 1987 .

[33]  Abhijit Gosavi,et al.  A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis , 2004, Machine Learning.

[34]  Frank Werner,et al.  Heuristic algorithms for unrelated parallel machine scheduling with a common due date, release dates, and linear earliness and tardiness penalties , 2001 .

[35]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[36]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[37]  Bernard W. Taylor,et al.  A comparative analysis of the COVERT job sequencing rule using various shop performance measures , 1987 .

[38]  Tapas K. Das,et al.  Intelligent dynamic control policies for serial production lines , 2001 .

[39]  S. Mahadevan,et al.  Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[40]  Vinícius Amaral Armentano,et al.  Tabu search for scheduling on identical parallel machines to minimize mean tardiness , 2000, J. Intell. Manuf..

[41]  Yi-Chi Wang,et al.  Application of reinforcement learning for agent-based production scheduling , 2005, Eng. Appl. Artif. Intell..

[42]  Farouk Yalaoui,et al.  An efficient heuristic approach for parallel machine scheduling with job splitting and sequence-dependent setup times , 2003 .

[43]  Arnaud Doucet,et al.  A policy gradient method for semi-Markov decision processes with application to call admission control , 2007, Eur. J. Oper. Res..

[44]  Jeffrey E. Diamond,et al.  Error bound for common due date assignment and job scheduling on parallel machines , 2000 .