Development and assessment of safety critical software is governed by many standards. Given the growing dependence on software in a range of industries, one might expect to see these standards reflecting a growing maturity in processes for development and assessment of safety critical software, and an international consensus on best practise. In reality, whilst there are commonalities in the standards, there are also major variations in the standards among sectors and countries. There are even greater variations in industrial practices. This leads us to consider why the variation exists and if any steps can be taken to obtain greater consensus. In this paper we start by clarifying the role of software in system safety, and briefly review the cost and effectiveness of current software development and assurance processes. We then investigate why there is such divergence in standards and practices, and consider the implications of this lack of commonality. We present some comparisons with other technologies to look for relevant insights. We then suggest some principles on which it might be possible to develop a cross-sector and international consensus. Our aim is to stimulate debate, not to present a “definitive” approach to achieving safety of systems including software. The meaning of software safety Software is becoming an increasingly important element of many safety-critical and safetyrelated systems. In many cases, software is the major determinant of system capability, e.g., in aircraft flight control systems, body electronics for cars, air traffic control, pacemakers and nuclear power plant control. For such systems, the software is often a major factor in the costs and risks of achieving and assuring safety. Costs can be of the order of $10M for each system on a modern aircraft. Some people argue that the term “software safety” is a misnomer as software, in and of itself, is not hazardous, i.e., it is not toxic, does not have high kinetic energy, and so on. Here we use the term “software safety” simply as a shorthand for “the contribution of software to safety in its system context”. Software can contribute to hazards through inappropriate control of a system particularly where it has full authority over some hazardous action. By “full authority” we mean that no other system or person can over-ride the software. Software can also contribute to hazards when it’s behaviour misleads system operators, and the operators thereby take inappropriate actions. Misleading operators is most critical if they have no means of cross-checking presented information, or have learnt to trust the software even when there are discrepancies with other sources of data. Described this way, software is much like any other technology, except that it can only contribute to unsafe conditions due to systematic causes, i.e., there is no “random” failure or “wear-out” mechanism for software. Systematic failures arise from flaws or limitations in the software requirements, in the design, or in implementation. Thus, software safety involves the consideration of how we can eliminate such flaws and how we can know whether or not we have eliminated said flaws. We refer to these two concerns as achieving and assuring safety. Why is there a concern? There is considerable debate on software safety in industry, academia, and government circles. This debate may seem slightly surprising, as software has a remarkably good track record. There have been several high-profile accidents, e.g., Ariane 5 ([1]) and Therac 25 ([2]), and in aerospace the Cali accident has been attributed to software (more strictly data ([3])), but a study of over 1,000 apparently “computer related” deaths ([4]) found that only 34 could be attributed to software issues. The critical failure rate of software in aerospace appears to be around 10 per hour ([5]), which is sufficient for it to have full authority over a hazardous/severe major event and still meet certification targets. In fact, most aircraft accidents stem from mechanical or electrical causes, so why is there a concern? We believe the concern arises out of four related factors. First, there is some scepticism related to the accident data. It is widely believed that many accidents put down to human error were actually the result of operators (e.g., pilots) being misled by the software. Also, software failures typically leave no trace, and so may not be seen as contributory causes to accidents (e.g., the controversy over the Chinook crash on the Mull of Kintyre ([6])). Further, much commercial software is unreliable, leading to a general distrust of software. Many critical systems have a long history of “nuisance” failures, which suggests that more problems would arise if software had greater authority. Second, systems and software are growing in complexity and authority at an unprecedented rate, and there is little confidence that current techniques for analysing and testing software will “keep up.” There have already been instances of projects where “cutting edge” design proposals have had to be rejected or scaled down because of the lack of suitable techniques for assuring the safety of the product. Third, we do not know how to measure software safety; thus it is hard to manage projects to know what are the best and most effective techniques to apply, or when “enough has been done”. Fourth, safety critical software is perceived to cost too much. This is both in relation to commercial software, and in relation to the cost of the rest of the system. In modern applications, e.g., car or aircraft systems, it may represent the majority of the development costs. These issues are inter-related; that is, costs will rise as software complexity increases. The cost and effectiveness of current practises It is difficult to obtain accurate data for the cost and effectiveness of software development and assessment processes as such information is sensitive. The data in the following sections are based on figures from a range of critical application projects primarily from aerospace projects based in Europe and the USA; however we are not at liberty to quote specific sources. Costs: Costs from software requirements to the end of unit testing are the most readily comparable between projects. Typically 1-5 lines of code (LoC) are produced per man day, with more recent projects being near the higher figure. Salary costs vary, but we calculate a mid-point of around $150 to $250 per LoC, or $25M for a system containing 100 kLoC of code. Typically, testing is the primary means of gaining assurance. Although the costs of tests vary enormously, e.g. with hardware design, testing frequently consumes more than half the development and assessment budget. Also, in many projects, change traffic is high. We know of projects where, in effect, the whole of the software is built to certification standards three times. In general, the rework is due to late discovery of requirements or design flaws. Flaws and failure rates: From a safety perspective, we are concerned about the rate of occurrence of hazardous events or accidents during the life of the system. As indicated above, aerospace systems seem to achieve around 10 failures per hour. We are also aware of systems which have over 10 hours of hazard free operation, although there have been losses of availability. However, there is no practical way of measuring such figures prior to putting the software into service ([7]) and, in practice, the best that can be measured pre-operationally is about 10 or 10 failures per hour. As an alternative to evaluating failure rates, we can try to measure the flaws in programs. We define a flaw as a deviation from intent. We are not concerned about all flaw types, so it is usual to categorise them, e.g., safety critical (sufficient to cause a hazard), safety related (can only cause a hazard with another failure), and so on. On this basis, so far as we can obtain data, anything less than 1 flaw per kLoC is world class. The best we have encountered is 0.1 per kLoC for Shuttle code. Some observations are in order. First, these figures are for known flaws; by definition, we do not know how many unknown flaws there are. We might expect all known flaws to be removed; however, removing flaws is an error prone process, and thus there comes a point where the risks of further change outweigh the benefits. Second, it is unclear how to relate flaw density to failure rate. There is evidence of a fairly strong correlation for some systems ([8]). In general, however, the correlation will depend on where the flaws are in the program. A system with a fairly high flaw density may have a low failure rate and vice versa depending on the distribution of flaws and demands (inputs). Third, commercial software has much higher flaw densities: perhaps 30-100 per kLoC; as much as two orders of magnitude higher! We can also analyse where flaws are introduced and where they are removed to try to assess the effectiveness of processes. The available data suggest that more than 70% of the flaws found after unit testing are requirements errors. We have heard figures as high as 85% ([9]). Late discovery of requirements errors is a major source of change, and of cost. Cost/effectiveness conclusions: The available data are insufficient to answer the most important questions, namely which approaches are most effective in terms of achieving safety (e.g., fewest safety-related flaws in the software) or most cost-effective. There are too many other factors (size of system, complexity, change density etc.) to make meaningful comparisons among the few data points available. It is not even possible to provide an objective answer to the question “does safety critical software cost too much?” as software is typically used to implement functionality that would be infeasible in other technologies. Thus there are few cases in which we can directly compare the costs of achieving a given level of safety i
[1]
Science and public policy.
,
1982,
Journal - Association of Official Analytical Chemists.
[2]
David Harel,et al.
Statecharts: A Visual Formalism for Complex Systems
,
1987,
Sci. Comput. Program..
[3]
M Barnes.
SOFTWARE SAFETY AND RELIABILITY - ACHIEVEMENT AND ASSESSMENT
,
1989
.
[4]
Ricky W. Butler,et al.
The infeasibility of experimental quantification of life-critical software reliability
,
1991
.
[5]
Donald MacKenzie,et al.
Computer-related accidental death: an empirical exploration
,
1994
.
[6]
John A. McDermid,et al.
A development of hazard analysis to aid software design
,
1994,
Proceedings of COMPASS'94 - 1994 IEEE 9th Annual Conference on Computer Assurance.
[7]
John A. McDermid,et al.
Experience with the application of HAZOP to computer-based systems
,
1995,
COMPASS '95 Proceedings of the Tenth Annual Conference on Computer Assurance Systems Integrity, Software Safety and Process Security'.
[8]
Nancy G. Leveson,et al.
Safeware: System Safety and Computers
,
1995
.
[9]
Martin L. Shooman.
Avionics software problem occurrence rates
,
1996,
Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.
[10]
Peter G. Bishop,et al.
A conservative theory for long term reliability growth prediction
,
1996,
Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.
[11]
Peter A. Lindsay,et al.
A systematic approach to software safety integrity levels
,
1997,
SAFECOMP.
[12]
John Barnes,et al.
High Integrity Ada: The Spark Approach
,
1997
.
[13]
Daniel Hoffman,et al.
Commonality and Variability in Software Engineering
,
1998,
IEEE Softw..
[14]
Derek Fowler.
Application of IEC 61508 to Air Traffic Management and Similar Complex Critical Systems - Methods and Mythology
,
1999
.
[15]
Felix Redmill,et al.
System Safety: HAZOP and Software HAZOP
,
1999
.
[16]
John A. McDermid,et al.
Extending Commonality Analysis for Embedded Control System Families
,
2000,
IW-SAPF.
[17]
Felix Redmill.
Safety Integrity Levels — theory and problems
,
2000
.
[18]
H. C. Wilson,et al.
Hazop and Hazan: Identifying and Assessing Process Industry Hazards, 4th edition
,
2001
.
[19]
John A. McDermid,et al.
Safety Analysis of Hardware / Software Interactions in Complex Systems
,
2002
.