Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints

Determination of inspection and maintenance policies for minimizing long-term risks and costs in deteriorating engineering environments constitutes a complex optimization problem. Major computational challenges include the (i) curse of dimensionality, due to exponential scaling of state/action set cardinalities with the number of components; (ii) curse of history, related to exponentially growing decision-trees with the number of decision-steps; (iii) presence of state uncertainties, induced by inherent environment stochasticity and variability of inspection/monitoring measurements; (iv) presence of constraints, pertaining to stochastic long-term limitations, due to resource scarcity and other infeasible/undesirable system responses. In this work, these challenges are addressed within a joint framework of constrained Partially Observable Markov Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL). POMDPs optimally tackle (ii)-(iii), combining stochastic dynamic programming with Bayesian inference principles. Multi-agent DRL addresses (i), through deep function parametrizations and decentralized control assumptions. Challenge (iv) is herein handled through proper state augmentation and Lagrangian relaxation, with emphasis on life-cycle risk-based constraints and budget limitations. The underlying algorithmic steps are provided, and the proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions, in cases where decisions must be made in the most resource- and risk-aware manner.

[1]  P. G. Morato,et al.  Optimal Inspection and Maintenance Planning for Deteriorating Structures through Dynamic Bayesian Networks and Markov Decision Processes , 2020, Structural Safety.

[2]  Kee-Eung Kim,et al.  Point-Based Value Iteration for Constrained POMDPs , 2011, IJCAI.

[3]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4]  Dan M. Frangopol,et al.  Risk-Based Maintenance Optimization of Deteriorating Bridges , 2015 .

[5]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[6]  Ross B. Corotis,et al.  INSPECTION, MAINTENANCE, AND REPAIR WITH PARTIAL OBSERVABILITY , 1995 .

[7]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[8]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[9]  Shie Mannor,et al.  Reward Constrained Policy Optimization , 2018, ICLR.

[10]  Antoine Grall,et al.  A condition-based maintenance policy for stochastically deteriorating systems , 2002, Reliab. Eng. Syst. Saf..

[11]  C. P. Andriotis,et al.  Managing engineering systems with large state and action spaces through deep reinforcement learning , 2018, Reliab. Eng. Syst. Saf..

[12]  Armen Der Kiureghian,et al.  Pre-posterior optimization of sequence of measurement and intervention actions under structural reliability constraint , 2015 .

[13]  Enrico Zio,et al.  A reinforcement learning framework for optimal operation and maintenance of power grids , 2019, Applied Energy.

[14]  Leonardo Dueñas-Osorio,et al.  The Interdependent Network Design Problem for Optimal Infrastructure System Restoration , 2016, Comput. Aided Civ. Infrastructure Eng..

[15]  Samer Madanat,et al.  Optimal infrastructure management decisions under uncertainty , 1993 .

[16]  Ning Zhang,et al.  A two‐level mixed‐integer programming model for bridge replacement prioritization , 2019, Comput. Aided Civ. Infrastructure Eng..

[17]  Martha White,et al.  Linear Off-Policy Actor-Critic , 2012, ICML.

[18]  R. Rackwitz,et al.  Socio-economically sustainable civil engineering infrastructures by optimization , 2005 .

[19]  Masanobu Shinozuka,et al.  Point-based pomdp solvers for life-cycle cost minimization of deteriorating structures , 2017 .

[20]  Yu Liu,et al.  Dynamic selective maintenance optimization for multi-state systems over a finite horizon: A deep reinforcement learning approach , 2020, Eur. J. Oper. Res..

[21]  Scaled Spherical Simplex Filter and State-Space Damage-Plasticity Finite-Element Model for Computationally Efficient System Identification , 2022, Journal of Engineering Mechanics.

[22]  Antoine Grall,et al.  A sequential condition‐based repair/replacement policy with non‐periodic inspections for a system subject to continuous wear , 2003 .

[23]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[24]  Eleni Chatzi,et al.  Maintenance planning using continuous-state partially observable Markov decision processes and non-linear action models , 2016 .

[25]  Dan M. Frangopol,et al.  A probabilistic computational framework for bridge network optimal maintenance scheduling , 2011, Reliab. Eng. Syst. Saf..

[26]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[27]  Nanning Zheng,et al.  Hindsight Trust Region Policy Optimization , 2019, IJCAI.

[28]  Dan M. Frangopol,et al.  Maintenance and Operation of Infrastructure Systems: Review , 2016 .

[29]  Guy Shani,et al.  Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[30]  Luca Podofillini,et al.  Condition-based maintenance optimization by means of genetic algorithms and Monte Carlo simulation , 2002, Reliab. Eng. Syst. Saf..

[31]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[32]  Rommert Dekker,et al.  Optimal maintenance of multi-component systems: a review , 2008 .

[33]  Marco Pavone,et al.  Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..

[34]  Dan M. Frangopol,et al.  Life-cycle management of deteriorating civil infrastructure considering resilience to lifetime hazards: A general approach based on renewal-reward processes , 2019, Reliab. Eng. Syst. Saf..

[35]  R. Bellman Dynamic programming. , 1957, Science.

[36]  K. G. Papakonstantinou,et al.  Probabalistic Structural Perfomance Assement in Hidden Damage Spaces , 2018 .

[37]  Philippe Rigo,et al.  POMDP based Maintenance Optimization of Offshore Wind Substructures including Monitoring , 2019 .

[38]  Daniel Straub,et al.  Risk-based optimal inspection strategies for structural systems using dynamic Bayesian networks , 2019, Structural Safety.

[39]  J. Zico Kolter,et al.  Hierarchical modeling of systems with similar components: A framework for adaptive monitoring and control , 2016, Reliab. Eng. Syst. Saf..

[40]  Kee-Eung Kim,et al.  Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes , 2015, AAAI.

[41]  Dan M. Frangopol,et al.  Optimal Resilience- and Cost-Based Postdisaster Intervention Prioritization for Bridges along a Highway Segment , 2012 .

[42]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[43]  Eleni Chatzi,et al.  Quantifying the value of structural monitoring for decision making , 2019 .

[44]  Dan M. Frangopol,et al.  Probabilistic models for life‐cycle performance of deteriorating structures: review and future directions , 2004 .

[45]  Daniel Straub,et al.  Predictive repair scheduling of wind turbine drive‐train components based on machine learning , 2019, Wind Energy.

[46]  Richard D. Braatz,et al.  Piecewise Linear Dynamic Programming for Constrained POMDPs , 2008, AAAI.

[47]  Stanislav Uryasev,et al.  Conditional Value-at-Risk for General Loss Distributions , 2002 .

[48]  Konstantinos Papakonstantinou,et al.  POMDP and MOMDP solutions for structural life-cycle cost minimization under partial and mixed observability , 2018 .

[49]  Mark G. Stewart,et al.  Risk assessment for civil engineering facilities: critical overview and discussion , 2003, Reliab. Eng. Syst. Saf..

[50]  M. Shinozuka,et al.  Planning structural inspection and maintenance policies via dynamic programming and Markov processes. Part I: Theory , 2014, Reliab. Eng. Syst. Saf..

[51]  Leonardo Dueñas-Osorio,et al.  Bridge retrofit prioritisation for ageing transportation networks subject to seismic hazards , 2013 .

[52]  Saeed Nozhati,et al.  Optimal Stochastic Dynamic Scheduling for Managing Community Recovery from Natural Hazards , 2018, Reliab. Eng. Syst. Saf..

[53]  Yiming Zhang,et al.  First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning , 2020, ArXiv.

[54]  Afshin Oroojlooyjadid,et al.  A review of cooperative multi-agent deep reinforcement learning , 2019, Applied Intelligence.

[55]  Ramin Moghaddass,et al.  A deep reinforcement learning approach for real-time sensor-driven decision making and predictive analytics , 2020, Comput. Ind. Eng..

[56]  John Dalsgaard Sørensen,et al.  Framework for Risk-based Planning of Operation and Maintenance for Offshore Wind Turbines , 2009 .

[57]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[60]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[61]  Dan M. Frangopol,et al.  Life-cycle cost design of deteriorating structures , 1997 .

[62]  E. Altman Constrained Markov Decision Processes , 1999 .

[63]  Mohammad Ghavamzadeh,et al.  Variance-constrained actor-critic algorithms for discounted and average reward MDPs , 2014, Machine Learning.

[64]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[65]  Matthijs T. J. Spaan,et al.  Column Generation Algorithms for Constrained POMDPs , 2018, J. Artif. Intell. Res..

[66]  Daniel Straub,et al.  Inspection and Maintenance Planning in Large Monitored Structures , 2018 .

[67]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[68]  Milad Memarzadeh,et al.  Integrated Inspection Scheduling and Maintenance Planning for Infrastructure Systems , 2016, Comput. Aided Civ. Infrastructure Eng..

[69]  Alice E. Smith,et al.  Penalty functions , 1996 .

[70]  Daniel Straub,et al.  Risk based inspection planning for structural systems , 2005 .

[71]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[72]  Stephen P. Boyd,et al.  End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging , 2018, ACM Trans. Graph..

[73]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[74]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[75]  Antoine Grall,et al.  Continuous-time predictive-maintenance scheduling for a deteriorating system , 2002, IEEE Trans. Reliab..

[76]  Gordon P. Warn,et al.  A Scaled Spherical Simplex Filter (S3F) with a decreased n + 2 sigma points set size and equivalent 2n + 1 Unscented Kalman Filter (UKF) accuracy , 2021 .

[77]  Konstantinos Papakonstantinou,et al.  Optimum inspection and maintenance policies for corroded structures using partially observable Markov decision processes and stochastic, physically based models , 2014 .

[78]  M. Shinozuka,et al.  Planning structural inspection and maintenance policies via dynamic programming and Markov processes. Part II: POMDP implementation , 2014, Reliab. Eng. Syst. Saf..

[79]  R Bellman,et al.  DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS. , 1956, Proceedings of the National Academy of Sciences of the United States of America.