Deep reinforcement learning based preventive maintenance policy for serial production lines

Abstract In the manufacturing industry, the preventive maintenance (PM) is a common practice to reduce random machine failures by replacing/repairing the aged machines or parts. The decision on when and where the preventive maintenance needs to be carried out is nontrivial due to the complex and stochastic nature of a serial production line with intermediate buffers. In order to improve the cost efficiency of the serial production lines, a deep reinforcement learning based approach is proposed to obtain PM policy. A novel modeling method for the serial production line is adopted during the learning process. A reward function is proposed based on the system production loss evaluation. The algorithm based on the Double Deep Q-Network is applied to learn the PM policy. Using the simulation study, the learning algorithm is proved effective in delivering preventive maintenance policy that leads to an increased throughput and reduced cost. Interestingly, the learned policy is found to frequently conduct “group maintenance” and “opportunistic maintenance”, although their concepts and rules are not provided during the learning process. This finding further demonstrates that the problem formulation, the proposed algorithm and the reward function setting in this paper are effective.

[1]  V. Ebrahimipour,et al.  Multi-objective modeling for preventive maintenance scheduling in a multiple production line , 2015, J. Intell. Manuf..

[2]  Antoine Grall,et al.  Joint modelling and optimization of monitoring and maintenance performance for a two-unit parallel system , 2007 .

[3]  Jing Huang,et al.  A Maintenance and Energy Saving Joint Control Scheme for Sustainable Manufacturing Systems , 2019 .

[4]  Divya Pandey,et al.  Joint consideration of production scheduling, maintenance and quality policies: a review and conceptual framework , 2010, Int. J. Adv. Oper. Manag..

[5]  Shahrul Kamaruddin,et al.  Opportunistic maintenance (OM) as a new advancement in maintenance approaches , 2014 .

[6]  Alaa Chateauneuf,et al.  Opportunistic policy for optimal preventive maintenance of a multi-component system in continuous operating units , 2009, Comput. Chem. Eng..

[7]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[8]  Jorge Arinez,et al.  Data-driven modeling and real-time distributed control for energy efficient manufacturing systems , 2017 .

[9]  Jorge Arinez,et al.  A Real-Time Maintenance Policy for Multi-Stage Manufacturing Systems Considering Imperfect Maintenance Effects , 2018, IEEE Access.

[10]  K. Wang,et al.  Intelligent Predictive Maintenance ( IPdM ) System – Industry 4.0 Scenario , 2016 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[12]  Cong Zhao,et al.  Analysis and Improvement of Multiproduct Bernoulli Serial Lines: Theory and Application , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[13]  Hongzhou Wang,et al.  A survey of maintenance policies of deteriorating systems , 2002, Eur. J. Oper. Res..

[14]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[15]  Jing Zou,et al.  Production System Performance Identification Using Sensor Data , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[16]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[17]  Napsiah Ismail,et al.  Maintenance scheduling incorporating dynamics of production system and real-time information from workstations , 2013, J. Intell. Manuf..

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[20]  José A. Ramírez-Hernández,et al.  Optimization of Preventive Maintenance scheduling in semiconductor manufacturing models using a simulation-based Approximate Dynamic Programming approach , 2010, 49th IEEE Conference on Decision and Control (CDC).

[21]  Stephan Biller,et al.  The Costs of Downtime Incidents in Serial Multistage Manufacturing Systems , 2012 .

[22]  Meng-Hua Ye Optimal replacement policy with stochastic maintenance and operation costs , 1990 .

[23]  Hui Li,et al.  Optimal policy for structure maintenance: A deep reinforcement learning framework , 2020 .

[24]  Guillaume Perrin,et al.  Adaptive early classification of temporal sequences using deep reinforcement learning , 2020, Knowl. Based Syst..

[25]  Xiao Wang,et al.  Multi-agent reinforcement learning based maintenance policy for a resource constrained flow line system , 2016, J. Intell. Manuf..

[26]  Jingshan Li,et al.  Throughput analysis of production systems: recent advances and future topics , 2009 .

[27]  Bryan Dodson,et al.  The Weibull Analysis Handbook , 1994 .

[28]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[29]  Matthieu van der Heijden,et al.  Joint optimisation of spare part inventory, maintenance frequency and repair capacity for k-out-of-N systems , 2009 .

[30]  Wenbin Wang,et al.  An overview of the recent advances in delay-time-based maintenance modelling , 2012, Reliab. Eng. Syst. Saf..

[31]  Feng Ju,et al.  Flexible preventative maintenance for serial production lines with multi-stage degrading machines and finite buffers , 2019, IISE Trans..

[32]  Stanley B. Gershwin,et al.  Performance evaluation of a two-machine line with a finite buffer and condition-based maintenance , 2017, Reliab. Eng. Syst. Saf..

[33]  Rommert Dekker,et al.  Optimal maintenance of multi-component systems: a review , 2008 .

[34]  Jorge Arinez,et al.  Dynamic production system diagnosis and prognosis using model-based data-driven method , 2017, Expert Syst. Appl..

[35]  E. G. Kyriakidis,et al.  Optimal maintenance of two stochastically deteriorating machines with an intermediate buffer , 2010, Eur. J. Oper. Res..

[36]  Tangbin Xia,et al.  Production-driven opportunistic maintenance for batch production based on MAM-APB scheduling , 2015, Eur. J. Oper. Res..

[37]  Sheldon M. Ross Introduction to probability models , 1998 .

[38]  Maxim Finkelstein,et al.  An optimal age-based group maintenance policy for multi-unit degrading systems , 2015, Reliab. Eng. Syst. Saf..

[39]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.