Independent Assessment of Safety-Critical Systems: We Bring Data!

Safety-critical systems are systems where failures lead to catastrophic results: resulting in loss of life, significant property damage, or damage to the environment. These systems range from aerospace on-board control systems, ground flight control systems, medical devices, nuclear power plants control, automotive systems, military systems, just to name a few. Other information systems are becoming more and more "safety-critical" due to the financial impacts of failures and the fact that human lives depend on them.The constant technology evolution is making these systems more complex and more common, and we depend more on them. Thus, we need to guarantee maximum dependability and safety properties with better processes and tools. The systems complexity is usually boosted by the flexibility of software, and thus software is becoming both a solution and a problem. Keeping dependability of software at the highest level requires evolutions that cover the processes and tools and all the development/qualification life-cycle phases: Specification; Architecture; Coding; Verification and Validation. Independent (software) verification and validation (ISVV) activities have been used and evolving since the seventies to ensure high safety and dependability and to take advantage of organizational and technical independence by avoiding biased assessments. Critical Software has been involved in developing and applying ISVV methods and techniques since the early 2000's, and this experience collected a significant amount of data that cover different domains: space on-board and ground systems, aeronautics, transportation, financial and banking, amongst others. This industrial paper will not cover the technical details of the processes, methods and tools applied, but will instead present important metrics and subsequent findings from the collected data. The results presented cover both technical issues found per each phase of the development life-cycle and required effort to perform the independent assessments. The outcome is interesting since it allows to compare data between similar or different industries (process/organizational maturities), same and different domains/criticalities, software developed either by experienced industrial partners or less experienced, life-cycle phase where more problems are detected or where they are easily detected, software/system dependability evolution after the first assessments, efficiency of the applied techniques and return on investment according to consultants level, initial systems maturity, criticality, project life-cycle phase, etc. All these factors can be analyzed from the collected metrics, and we can also conclude on the number “non-common” issues found, that include abnormal behavior of the systems (for example under non-nominal conditions), significant organizational factors (yes, they are really important) or human factors (operator related risks, security threats, etc). A study from Johnson and Holloway over some of the major aviation and maritime accidents in North America during 1996-2006 concluded that the proportion of causal and contributory factors related to organizational issues exceeded those due to human errors. The study showed that the causal and contributory factors in the USA aviation accidents have the following distribution: 48% is related to organizational factors, the equivalent human factors represented 37%, equipment factors represented 12%, other causes represented 3%; The same exercise for maritime accidents classified: 53% due to organizational factors, 24-29% as human error, 10-19% to equipment failures, and 2-4% as other causes. The data presented and analyzed in this industrial paper comes from dozens of projects, and originated over 3000 issues. This article will present the facts related to safety-critical software development quality metrics performed by independent assessments of quite mature systems, and will infer also return on investment (ROI) related metrics such as effort required to perform specific types of assessments (requirements analysis, code inspections, etc) and the obtained results on terms of number of issues found per hour according to the issues severity.