Assessing Safety-Critical Systems from Operational Testing: A Study on Autonomous Vehicles

Abstract Context Demonstrating high reliability and safety for safety-critical systems (SCSs) remains a hard problem. Diverse evidence needs to be combined in a rigorous way: in particular, results of operational testing with other evidence from design and verification. Growing use of machine learning in SCSs, by precluding most established methods for gaining assurance, makes evidence from operational testing even more important for supporting safety and reliability claims. Objective We revisit the problem of using operational testing to demonstrate high reliability. We use Autonomous Vehicles (AVs) as a current example. AVs are making their debut on public roads: methods for assessing whether an AV is safe enough are urgently needed. We demonstrate how to answer 5 questions that would arise in assessing an AV type, starting with those proposed by a highly-cited study. Method We apply new theorems extending our Conservative Bayesian Inference (CBI) approach, which exploit the rigour of Bayesian methods while reducing the risk of involuntary misuse associated (we argue) with now-common applications of Bayesian inference; we define additional conditions needed for applying these methods to AVs. Results Prior knowledge can bring substantial advantages if the AV design allows strong expectations of safety before road testing. We also show how naive attempts at conservative assessment may lead to over-optimism instead; why extrapolating the trend of disengagements (take-overs by human drivers) is not suitable for safety claims; use of knowledge that an AV has moved to a “less stressful” environment. Conclusion While some reliability targets will remain too high to be practically verifiable, our CBI approach removes a major source of doubt: it allows use of prior knowledge without inducing dangerously optimistic biases. For certain ranges of required reliability and prior beliefs, CBI thus supports feasible, sound arguments. Useful conservative claims can be derived from limited prior knowledge.

[1]  Lorenzo Strigini,et al.  Assessing Asymmetric Fault-Tolerant Software , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[2]  David Wright,et al.  Conservative claims for the probability of perfection of a software-based system using operational experience of previous similar systems , 2018, Reliab. Eng. Syst. Saf..

[3]  Bev Littlewood,et al.  Validation of ultrahigh dependability for software-based systems , 1993, CACM.

[4]  Douglas R. Miller Exponential order statistic models of software reliability growth , 1986, IEEE Transactions on Software Engineering.

[5]  David Wright,et al.  Conservative claims about the probability of perfection of software-based systems , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[6]  Bojan Cukic,et al.  Bayesian framework for reliability assurance of a deployed safety critical system , 2000, Proceedings. Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE 2000).

[7]  David Wright,et al.  Modeling the probability of failure on demand (pfd) of a 1-out-of-2 system in which one channel is "quasi-perfect" , 2017, Reliab. Eng. Syst. Saf..

[8]  David Wright,et al.  Bounds on survival probability given mean probability of failure per demand; and the paradoxical advantages of uncertainty , 2014, Reliab. Eng. Syst. Saf..

[9]  David Flynn,et al.  A Safety Framework for Critical Systems Utilising Deep Neural Networks , 2020, SAFECOMP.

[10]  David Wright,et al.  Toward a Formalism for Conservative Claims about the Dependability of Software-Based Systems , 2011, IEEE Transactions on Software Engineering.

[11]  Sarah Brocklehurst,et al.  Recalibrating Software Reliability Models , 1990, IEEE Trans. Software Eng..

[12]  Ravishankar K. Iyer,et al.  Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[13]  R. Bell,et al.  IEC 61508: functional safety of electrical/electronic/ programme electronic safety-related systems: overview , 1999 .

[14]  Nazanin Nader,et al.  Autonomous vehicles' disengagements: Trends, triggers, and regulatory limitations. , 2018, Accident; analysis and prevention.

[15]  Sai Chand,et al.  Autonomous Vehicles: Disengagements, Accidents and Reaction Times , 2016, PloS one.

[16]  Bev Littlewood,et al.  On reliability assessment when a software-based system is replaced by a thought-to-be-better one , 2020, Reliab. Eng. Syst. Saf..

[17]  Philip Koopman,et al.  Safety Argument Considerations for Public Road Testing of Autonomous Vehicles , 2019, SAE Technical Paper Series.

[18]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[19]  Lorenzo Strigini,et al.  Software Fault-Freeness and Reliability Predictions , 2013, SAFECOMP.

[20]  David D. Woods,et al.  Systems with Human Monitors: A Signal Detection Analysis , 1985, Hum. Comput. Interact..

[21]  Devesh Bhatt,et al.  Considerations in Assuring Safety of Increasingly Autonomous Systems [STUB] , 2018 .

[22]  Peng Liu,et al.  How Safe Is Safe Enough for Self‐Driving Vehicles? , 2019, Risk analysis : an official publication of the Society for Risk Analysis.

[23]  Nazanin Nader,et al.  Examining accident reports involving autonomous vehicles in California , 2017, PloS one.

[24]  Heidy Khlaaf,et al.  Disruptive Innovations and Disruptive Assurance: Assuring Machine Learning and Autonomy , 2019, Computer.

[25]  Bev Littlewood,et al.  Guidelines for Statistical Testing , 1997 .

[26]  Kishor S. Trivedi,et al.  Modeling Correlation in Software Recovery Blocks , 1993, IEEE Trans. Software Eng..

[27]  Luca Pulina,et al.  Verification and repair of control policies for safe reinforcement learning , 2017, Applied Intelligence.

[28]  Bev Littlewood,et al.  Reasoning about the Reliability of Diverse Two-Channel Systems in Which One Channel Is "Possibly Perfect" , 2012, IEEE Transactions on Software Engineering.

[29]  Sarah Brocklehurst,et al.  Techniques for prediction analysis and recalibration , 1996 .

[30]  Nidhi Kalra,et al.  Driving to Safety , 2016 .

[31]  Bojan Cukic,et al.  Software reliability corroboration , 2002, 27th Annual NASA Goddard/IEEE Software Engineering Workshop, 2002. Proceedings..

[32]  Michael Fisher,et al.  Verifying autonomous systems , 2013, CACM.

[33]  Frank P. A. Coolen,et al.  Bayesian nonparametric system reliability using sets of priors , 2017, Int. J. Approx. Reason..

[34]  Michael Fisher,et al.  Verifiable Self-Certifying Autonomous Systems , 2018, 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[35]  Katerina Goseva-Popstojanova,et al.  Failure correlation in software reliability models , 2000, IEEE Trans. Reliab..

[36]  J. Berger Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[37]  John H. R. May,et al.  Reliability estimation from appropriate testing of plant protection software , 1995, Softw. Eng. J..

[38]  Lee Skrypchuk,et al.  Analysis of autopilot disengagements occurring during autonomous vehicle testing , 2018, IEEE/CAA Journal of Automatica Sinica.

[39]  Philip Koopman,et al.  Autonomous Vehicle Safety: An Interdisciplinary Challenge , 2017, IEEE Intelligent Transportation Systems Magazine.

[40]  Lorenzo Strigini,et al.  Assessing the risk due to software faults: estimates of failure rate versus evidence of perfection , 1998 .

[41]  Michael Fisher,et al.  Probabilistic Model Checking of Robots Deployed in Extreme Environments , 2018, AAAI.

[42]  G. B. Finelli,et al.  The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software , 1993, IEEE Trans. Software Eng..

[43]  Peter G. Bishop,et al.  Deriving a frequentist conservative confidence bound for probability of failure per demand for systems with different operational and test profiles , 2017, Reliab. Eng. Syst. Saf..

[44]  Lev V. Utkin,et al.  Imprecise probabilistic inference for software run reliability growth models. , 2018 .

[45]  Lorenzo Strigini On Testing Process Control Software for Reliability Assessment: the Effects of Correlation between Successive Failures , 1996, Softw. Test. Verification Reliab..

[46]  Lorenzo Strigini,et al.  Human-machine diversity in the use of computerised advisory systems: a case study , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[47]  Bev Littlewood,et al.  "Validation of ultra-high dependability…" – 20 years on , 2011 .

[48]  Lorenzo Strigini,et al.  Assessing the Risk due to Software Faults: Estimates of Failure Rate versus Evidence of Perfection , 1998, Softw. Test. Verification Reliab..

[49]  Philip Koopman,et al.  Credible Autonomy Safety Argumentation , 2018 .

[50]  Nidhi Kalra,et al.  Autonomous Vehicle Technology: A Guide for Policymakers , 2014 .

[51]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[52]  John A. McDermid,et al.  Mind the gaps: Assuring the safety of autonomous systems from an engineering, ethical, and legal perspective , 2020, Artif. Intell..

[53]  Lorenzo Strigini,et al.  Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).

[54]  C. W. Johnson The Increasing Risks of Risk Assessment : On the Rise of Artificial Intelligence and NonDeterminism in Safety-Critical Systems , 2017 .