Optimal policy for structure maintenance: A deep reinforcement learning framework

Abstract The cost-effective management of aged infrastructure is an issue of worldwide concern. Markov decision process (MDP) models have been used in developing structural maintenance policies. Recent advances in the artificial intelligence (AI) community have shown that deep reinforcement learning (DRL) has the potential to solve large MDP optimization tasks. This paper proposes a novel automated DRL framework to obtain an optimized structural maintenance policy. The DRL framework contains a decision maker (AI agent) and the structure that needs to be maintained (AI task environment). The agent outputs maintenance policies and chooses maintenance actions, and the task environment determines the state transition of the structure and returns rewards to the agent under given maintenance actions. The advantages of the DRL framework include: (1) a deep neural network (DNN) is employed to learn the state-action Q value (defined as the predicted discounted expectation of the return for consequences under a given state-action pair), either based on simulations or historical data, and the policy is then obtained from the Q value; (2) optimization of the learning process is sample-based so that it can learn directly from real historical data collected from multiple bridges (i.e., big data from a large number of bridges); and (3) a general framework is used for different structure maintenance tasks with minimal changes to the neural network architecture. Case studies for a simple bridge deck with seven components and a long-span cable-stayed bridge with 263 components are performed to demonstrate the proposed procedure. The results show that the DRL is efficient at finding the optimal policy for maintenance tasks for both simple and complex structures.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Kenneth Kuhn,et al.  Network-Level Infrastructure Management Using Approximate Dynamic Programming , 2010 .

[3]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[4]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[5]  Charles-Antoine Robelin,et al.  Dynamic Programing based Maintenance and Replacement Optimization for Bridge Decks using History-Dependent Deterioration Models , 2006 .

[6]  Dan M. Frangopol,et al.  Bridge life-cycle performance and cost: analysis, prediction, optimisation and decision-making , 2017, Structures and Infrastructure Systems.

[7]  Jan M. van Noortwijk,et al.  A survey of the application of gamma processes in maintenance , 2009, Reliab. Eng. Syst. Saf..

[8]  Q. Hu,et al.  Markov decision processes with their applications , 2007 .

[9]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[10]  Erik H. Vanmarcke,et al.  Modeling Bridge Deterioration with Markov Chains , 1992 .

[11]  George Morcous,et al.  Performance Prediction of Bridge Deck Systems Using Markov Chains , 2006 .

[12]  Hui Li,et al.  Condition assessment of cables by pattern recognition of vehicle-induced cable tension ratio , 2018 .

[13]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Hui Li,et al.  Strain features and condition assessment of orthotropic steel deck cable-supported bridges subjected to vehicle loads by using dense FBG strain sensors , 2017 .

[16]  Dan M. Frangopol,et al.  Probabilistic models for life‐cycle performance of deteriorating structures: review and future directions , 2004 .

[17]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[18]  Konstantinos Papakonstantinou,et al.  Optimum inspection and maintenance policies for corroded structures using partially observable Markov decision processes and stochastic, physically based models , 2014 .

[19]  Hui Li,et al.  The State of the Art of Data Science and Engineering in Structural Health Monitoring , 2019, Engineering.

[20]  Allen R. Marshall,et al.  The PONTIS bridge management system , 1998 .

[21]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[22]  Dan M. Frangopol,et al.  Life-cycle performance, management, and optimisation of structural systems under uncertainty: accomplishments and challenges 1 , 2011, Structures and Infrastructure Systems.

[23]  Samer Madanat,et al.  Simultaneous Network Optimization Approach for Pavement Management Systems , 2014 .

[24]  Samer Madanat,et al.  Poisson Regression Models of Infrastructure Transition Probabilities , 1995 .

[25]  Hui Li,et al.  Automatic seismic damage identification of reinforced concrete columns from images by a region‐based deep convolutional neural network , 2019, Structural Control and Health Monitoring.

[26]  Hui Li,et al.  The random field model of the spatial distribution of heavy vehicle loads on long-span bridges , 2016, SPIE Smart Structures and Materials + Nondestructive Evaluation and Health Monitoring.

[27]  Hugh Hawk BRIDGIT deterioration models , 1995 .

[28]  Ross B. Corotis,et al.  Reliability-based bridge design and life cycle management with Markov decision processes☆ , 1994 .

[29]  Jinping Ou,et al.  The state of the art in structural health monitoring of cable-stayed bridges , 2016 .

[30]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[31]  Hui Li,et al.  Convolutional neural network‐based data anomaly detection method using multiple information for structural health monitoring , 2018, Structural Control and Health Monitoring.