Toward risk assessment 2.0: Safety supervisory control and model-based hazard monitoring for risk-informed safety interventions

Abstract Probabilistic Risk Assessment (PRA) is a staple in the engineering risk community, and it has become to some extent synonymous with the entire quantitative risk assessment undertaking. Limitations of PRA continue to occupy researchers, and workarounds are often proposed. After a brief review of this literature, we propose to address some of PRA׳s limitations by developing a novel framework and analytical tools for model-based system safety, or safety supervisory control, to guide safety interventions and support a dynamic approach to risk assessment and accident prevention. Our work shifts the emphasis from the pervading probabilistic mindset in risk assessment toward the notions of danger indices and hazard temporal contingency. The framework and tools here developed are grounded in Control Theory and make use of the state-space formalism in modeling dynamical systems. We show that the use of state variables enables the definition of metrics for accident escalation, termed hazard levels or danger indices, which measure the “proximity” of the system state to adverse events, and we illustrate the development of such indices. Monitoring of the hazard levels provides diagnostic information to support both on-line and off-line safety interventions. For example, we show how the application of the proposed tools to a rejected takeoff scenario provides new insight to support pilots’ go/no-go decisions. Furthermore, we augment the traditional state-space equations with a hazard equation and use the latter to estimate the times at which critical thresholds for the hazard level are (b)reached. This estimation process provides important prognostic information and produces a proxy for a time-to-accident metric or advance notice for an impending adverse event. The ability to estimate these two hazard coordinates, danger index and time-to-accident, offers many possibilities for informing system control strategies and improving accident prevention and risk mitigation. Finally we develop a visualization tool, termed hazard temporal contingency map, which dynamically displays the “coordinates” of a portfolio of hazards. This tool is meant to support operators’ situational awareness by providing prognostic information regarding the time windows available to intervene before hazardous situations become unrecoverable, and it helps decision-makers prioritize attention and defensive resources for accident prevention. In this view, emerging risks and hazards are dynamically prioritized based on the temporal vicinity of their associated accident(s) to being released, not on probabilities or combination of probabilities and consequences, as is traditionally done (off-line) in PRA. This approach offers novel capabilities, complementary to PRA, for improving risk assessment and accident prevention. It is hoped that this work helps to expand the basis of risk assessment beyond its reliance on probabilistic tools, and that it serves to enrich the intellectual toolkit of risk researchers and safety professionals.

[1]  Enrico Zio,et al.  Integrated deterministic and probabilistic safety assessment: Concepts, challenges, research directions , 2014 .

[2]  Lisa M. Bartlett,et al.  Application of the digraph method in system fault diagnostics , 2006, First International Conference on Availability, Reliability and Security (ARES'06).

[3]  H. Hashemian On-line monitoring applications in nuclear power plants , 2011 .

[4]  Andrew Starr,et al.  Operational fault diagnosis of manufacturing systems , 2003 .

[5]  Rajagopalan Srinivasan,et al.  Multi-agent based collaborative fault detection and identification in chemical processes , 2010, Eng. Appl. Artif. Intell..

[6]  Michael D. Harrison,et al.  Using Temporal Logic to Support the Specification and Prototyping of Interactive Control Systems , 1992, Int. J. Man Mach. Stud..

[7]  Marc Bouissou,et al.  A new formalism that combines advantages of fault-trees and Markov models: Boolean logic driven Markov processes , 2003, Reliab. Eng. Syst. Saf..

[8]  George E Apostolakis,et al.  How Useful Is Quantitative Risk Assessment? , 2004, Risk analysis : an official publication of the Society for Risk Analysis.

[9]  Efstathios Bakolas,et al.  Highlights from the literature on accident causation and system safety: Review of major ideas, recent contributions, and challenges , 2010, Reliab. Eng. Syst. Saf..

[10]  Yiannis Papadopoulos,et al.  Multi-agent safety monitor , 2010 .

[11]  Diego Mandelli,et al.  Probabilistic risk assessment modeling of digital instrumentation and control systems using two dynamic methodologies , 2010, Reliab. Eng. Syst. Saf..

[12]  Mohammad Modarres,et al.  Hierarchical decision process for fault administration , 1992 .

[13]  Joseph H. Saleh,et al.  Observability-in-depth: An Essential Complement to the Defense-in-depth Safety Strategy in the Nuclear Industry , 2014 .

[14]  Raghunathan Rengaswamy,et al.  A framework for on-line trend extraction and fault diagnosis , 2010, Eng. Appl. Artif. Intell..

[15]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part III: Process history based methods , 2003, Comput. Chem. Eng..

[16]  Aloysius K. Mok,et al.  Modechart: A Specification Language for Real-Time Systems , 1994, IEEE Trans. Software Eng..

[17]  Chris W. Johnson Decision theory and safety-critical interfaces , 1995, INTERACT.

[18]  Mats Per Erik Heimdahl,et al.  Model-Based Safety Analysis of Simulink Models Using SCADE Design Verifier , 2005, SAFECOMP.

[19]  Raghvendra V. Cowlagi,et al.  Coordinability and Consistency in Accident Causation and Prevention: Formal System Theoretic Concepts for Safety in Multilevel Systems , 2013, Risk analysis : an official publication of the Society for Risk Analysis.

[20]  Joseph H. Saleh,et al.  Temporal Logic for System Safety Properties and Hazard Monitoring , 2016 .

[21]  Maria Grazia Gnoni,et al.  “Lean occupational” safety: An application for a Near-miss Management System design , 2013 .

[22]  Efstathios Bakolas,et al.  Texas City refinery accident: Case study in breakdown of defense-in-depth and violation of the safety–diagnosability principle in design , 2014 .

[23]  Efstathios Bakolas,et al.  Augmenting defense-in-depth with the concepts of observability and diagnosability from Control Theory and Discrete Event Systems , 2011, Reliab. Eng. Syst. Saf..

[24]  Marco Bozzano,et al.  ESACS: an integrated methodology for design and safety analysis of complex systems , 2003 .

[25]  Koji Ikuta,et al.  Safety Evaluation Method of Design and Control for Human-Care Robots , 2003, Int. J. Robotics Res..

[26]  Joseph H. Saleh,et al.  Application of temporal logic for safety supervisory control and model-based hazard monitoring , 2018, Reliab. Eng. Syst. Saf..

[27]  Yacov Y Haimes,et al.  On the Complex Definition of Risk: A Systems‐Based Approach , 2009, Risk analysis : an official publication of the Society for Risk Analysis.

[28]  Maria Grazia Gnoni,et al.  Near-miss management systems: A methodological comparison , 2012 .

[29]  Nancy G. Leveson,et al.  Applying STAMP in Accident Analysis , 2003 .

[30]  Chi-Tsong Chen,et al.  Linear System Theory and Design , 1995 .

[31]  Girish Keshav Palshikar Temporal fault trees , 2002, Inf. Softw. Technol..

[32]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies , 2003, Comput. Chem. Eng..

[33]  R. E. Kalman,et al.  Contributions to the Theory of Optimal Control , 1960 .

[34]  Diego Mandelli,et al.  A Benchmark System for Comparing Reliability Modeling Approaches for Digital Instrumentation and Control Systems , 2009 .

[35]  Ali Mosleh,et al.  PRA: A PERSPECTIVE ON STRENGTHS, CURRENT LIMITATIONS, AND POSSIBLE IMPROVEMENTS , 2014 .

[36]  Jens Rasmussen,et al.  Risk management in a dynamic society: a modelling problem , 1997 .

[37]  Nancy G. Leveson,et al.  A new accident model for engineering safer systems , 2004 .

[38]  R Bellman,et al.  DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS. , 1956, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Carol Smidts,et al.  QRAS - the Quantitative Risk Assessment System , 2002, Annual Reliability and Maintainability Symposium. 2002 Proceedings (Cat. No.02CH37318).

[40]  Tunc Aldemir,et al.  A survey of dynamic methodologies for probabilistic safety assessment of nuclear power plants , 2013 .

[41]  Marco Bozzano,et al.  Improving System Reliability via Model Checking: The FSAP/NuSMV-SA Safety Analysis Platform , 2003, SAFECOMP.

[42]  Joseph H. Saleh,et al.  Observability-in-Depth: Safety Strategy to Complement Defense-in-Depth for Dynamic Real-Time Allocation of Defensive Resources , 2013 .

[43]  John Thomas,et al.  Modeling and Hazard Analysis Using Stpa , 2010 .

[44]  Dana Kulic,et al.  Safe planning for human-robot interaction , 2005 .

[45]  Aloysius K. Mok,et al.  Safety analysis of timing properties in real-time systems , 1986, IEEE Transactions on Software Engineering.

[46]  Ajit Srividya,et al.  Dynamic fault tree analysis using Monte Carlo simulation in probabilistic safety assessment , 2009, Reliab. Eng. Syst. Saf..

[47]  Yiannis Papadopoulos,et al.  Qualitative temporal analysis: Towards a full implementation of the Fault Tree Handbook , 2009 .

[48]  Dimitri N. Mavris,et al.  Software contributions to aircraft adverse events: Case studies and analyses of recurrent accident patterns and failure mechanisms , 2013, Reliab. Eng. Syst. Saf..

[49]  Alan D. Swain,et al.  Human reliability analysis: Need, status, trends and limitations , 1990 .

[50]  Anders P. Ravn,et al.  From Safety Analysis to Software Requirements , 1998, IEEE Trans. Software Eng..

[51]  Nancy G. Leveson,et al.  Safeware: System Safety and Computers , 1995 .

[52]  John A. McDermid,et al.  Analysis and synthesis of the behaviour of complex programmable electronic systems in conditions of failure , 2001, Reliab. Eng. Syst. Saf..

[53]  Chris W. Johnson,et al.  PII: S0951-8320(99)00066-6 , 1999 .

[54]  Rolf Isermann,et al.  Model-based fault-detection and diagnosis - status and applications , 2004, Annu. Rev. Control..

[55]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part I: Quantitative model-based methods , 2003, Comput. Chem. Eng..

[56]  Jan Magott,et al.  Timing analysis of safety properties using fault trees with time dependencies and timed state-charts , 2012, Reliab. Eng. Syst. Saf..