This position paper deals with the tension between the desire for sound and auditable assurance cases and the current ubiquitous reliance on expert judgement. I believe that the use of expert judgement, though inevitable, needs to be much more cautious and disciplined than it usually is. The idea of assurance “cases” owes its appeal to an awareness that all too often critical decisions are made in ways that are difficult to justify or even to explain, leaving the doubt (for the decision makers as well as other interested parties) that the decision may be unsound. By building a well-structured “case” we would wish to allow proper scrutiny of the evidence and assumptions used, and of the arguments that link them to support a decision. An obstacle to achieving this goal is the important role that expert judgement plays in much decision making. The purpose of an assurance case is to a large extent to redirect dependence on judgement to issues on which we can trust this judgement; I doubt that this is done effectively in current practice. Making arguments explicit and if possible mathematically formal is one of the defences, yet formalism does not solve all problems and even creates some of its own. I believe that further progress must depend on better use of the knowledge produced by other disciplines about the cognitive, cultural and organizational factors that affect the production and use of assurance cases, and on studying the successes and failures of assurance cases. 1. “Cases” and judgement Building a sound, well-structured “case” that collects facts about a system and arguments linking them to support decisions is an attractive idea. We would want decision makers to be able to see clearly the structure of arguments whether built by themselves or by others check that the evidence is used properly, the assumptions are acceptable, the reasoning is consistent, so that the claims made can then be seen to deserve sufficient confidence to drive decisions. In everyday practice, though, the various parts of a case tend not to be equally sound and well-structured. The part that is most often laid out clearly and rigorously is the raw evidence e.g., results of tests and verification procedures, and implications that can be derived from them by straightforward algorithms. One could say that this is – in principle – the easy part, requiring only commitment and an adequate amount of effort. The arguments present greater difficulties. Some possible arguments are simple e.g., straight statistical inference from realistic testing can be a textbook exercise but they are often insufficient: they do not lend enough confidence in the claim on which a decisions should be based. A (somewhat extreme) example is given by applications with “ultra-high” dependability requirements: each kind of evidence available is usually insufficient to prove that the system is acceptable, even if no evidence points at it not having the required level of dependability attributes [6]. Statistical inference from test operation does not give sufficient confidence (within feasible testing duration), proofs do not cover certain aspects of systems, etc. A decision maker, e.g. a safety regulator, needs to consider these kinds of evidence and many others – often an unmanageable amount of evidence, in fact – and produce a simple decision: whether the system is safe enough to operate. A sound case to help this regulator would detail what can be said about, for instance: how much confidence in a system being “safe enough” (as defined for the system and application in question) can we expect based on the production process used? How much does a certain period of successful test operation add to that confidence? How do the uncertainties about the future environment of the system – profile of use and threats – weaken it? And what about scenarios that have been assumed to have negligible probabilities? All these are difficult questions. Quite often there is no sound scientific knowledge about the relationship between these disparate components of the evidence and the property that we are interested in assessing. These difficulties are commonly solved by invoking “expert judgement”, i.e., roughly, some subclaim (or the main claim) will be accepted without a detailed argument to support it, but on the basis of trust in the person (or consensus of persons) stating it. The role of judgement ranges from estimating individual variables – like the probability of a certain failure mode – to drawing conclusions from the whole complex collection of disparate evidence and partial claims. In the former case, the expert’s opinion is substituted in place of empirical evidence that is not available. In the latter, it fills in for missing scientific knowledge and/or complex arguments. The experts’ opinions are trusted even though their bases cannot be examined and audited: the rules the experts have applied, and their basis in reasoning and past experience, may not be clear even to the experts themselves, and anyway they are not spelled out. Indeed, if they were spelled out in full, the case would rest on these explicit rules and arguments (which other experts can examine and discuss), without a need to appeal to “judgement”. To avoid confusion, I must say that “judgement” is never absent from the acceptance of an argument, however scientifically rigorous. Scientific (or engineering) rigour largely consist in building an argument so that the “hard” parts – those that appear highly vulnerable to errors and misunderstandings – are argued explicitly and clearly. Direct reliance on “judgement” without explicit arguments is not eliminated, but it is relegated to less hard issues. For instance, we would not require an explicit argument in order to trust our manual verification of very simple logical or mathematical statements used in an argument. Accepting our direct judgement on these points prevents an infinite recursion in the building of a case. There will also be grey areas: statements for which some will think judgement is sufficient, and some will require more explicit arguments. However, when “expert judgement” or consensus is explicitly invoked in discussing system dependability, it is often, in my experience, to support statements about “hard” issues. Even on “hard” issues, reliance on expert judgement is often necessary. Yet, the respect with which it is treated is not always supported by evidence of its accuracy. Actually, there is abundant anecdotal and scientific evidence of inaccurate expert judgement. Apart from the dangers of emotion and conflicts of interest, experimental psychologists have documented many types of failures of the heuristics applied by human minds to problems involving uncertainty and probability, as problems of assurance usually do. Even people trained in probabilistic reasoning have been shown to make serious mistakes if they tried to use “judgement” without applying explicit, formal probabilistic reasoning. It is not the case that 1 There is an abundant and growing body of research about these issues, with various accessible books on judgement under uncertainty and expert judgement in particular. I tried to summarise its implications from a dependability viewpoint [8] during the EU SHIP project (http://www.adelard.co.uk/research/ship.htm). The body of results has grown since, and I believe it has if anything reinforced the conclusion that expert judgement is now treated with insufficient caution. experts are necessarily bad at dealing with uncertainty: some experts, and some categories of experts, have been shown to be routinely quite good. The problem is lack of evidence that we can trust expert judgement in general and a priori. Thus we cannot trust a priori a specific judgement by a specific expert, unless we have some explicit and convincing argument for doing so. Such an argument could be for instance that the expert has both the means for forming an accurate judgement (for instance, that his/her experience is pertinent to the problem at hand, and sufficiently extensive to support the kind of claim the expert is called to make) and a record of accurate judgement in similar circumstances. Unfortunately, such arguments are not commonly included in assurance cases, and I believe that in many important applications they would not stand. The experts’ judgements may well be very accurate, but we would lack evidence for believing them to be so; just as very dependable systems are built for which – due to how the development is managed and documented – we could not build a convincing argument that they wil l be so dependable, before they turn out to be so in operation. Luckily, researchers have also studied how circumstances, and especially the way tasks are set, affect the accuracy of judgements. For instance, it appears that people are much better at judging probabilities in settings that require them to count events than in settings that require reasoning directly in the more abstract language of probabilistic calculus. Some findings concern cognitive problems in individual judgements, some the formation of consensus, and so on. I believe that more use could be made of this kind of knowledge, and the community building “assurance cases” needs better contacts with the psychologists studying these phenomena. 2. Formalizing judgement A possible defense, in the spirit of building assurance cases, is simply to reduce reliance on judgement by substituting instances of intuitive judgement with formal, explicit, auditable arguments. For drawing conclusions from disparate and largely “soft” evidence, as in the above example of the safety regulator, a way forward seems to be in giving the experts a way of describing the sound argument that their mental processes emulate or should emulate. If the description is formal and mathematical, it will require clear statements of its assumptions, the evidence and the general laws invoked to support a claim. Therefo
[1]
Bev Littlewood,et al.
Bayesian Belief Network Model for the Safety Assessment of Nuclear Computer-based Systems
,
1997
.
[2]
Lorenzo Strigini,et al.
Engineering judgement in reliability and safety and its limits: what can we learn from research in psychology
,
1996
.
[3]
Lorenzo Strigini,et al.
Formalising Engineering Judgement on Software Dependability via Belief Networks
,
1998
.
[4]
Bev Littlewood,et al.
Multi-legged arguments:the impact of diversity upon confidence in dependability arguments
,
2003,
2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..
[5]
Bev Littlewood,et al.
Bayesian belief networks for safety assessment of computer-based systems
,
2000
.
[6]
David Wright.
Elicitation and Validation of Graphical Dependability Models
,
2003,
SAFECOMP.
[7]
Bev Littlewood,et al.
Validation of ultrahigh dependability for software-based systems
,
1993,
CACM.
[8]
Bev Littlewood,et al.
Examination of Bayesian belief network for safety assessment of nuclear computer-based systems
,
1998
.