Evaluation of expert system testing methods

&pert systems ure being developed commercially to solve nontraditional problems in such areas as auditing, fault diagnosis, and computer configuration. As expert systems move outji-om research laboratories to commercial production environments, establishing reliabili~ and robustness have taken on increasing importance [l, 5, 9, 20, 211. In this article, we would like to o~ess tk comparative effectiveness of testing methods including blackbox, whit&ox, consistency, and comj&teness testing methods [9, E-19, 22,231 in detectingfaults.' e take the approach that a " expert system life cycle CO ". sisu of the problem-specification phase2 [Z, 3, 5, 12, 131, solution-specification phase, high-level design phase, implementation phase, and testing phase. This approach is consistent with the modern expert system life cycle as suggested in [9, 211. Ezj% s~slnn testing (generally known as verification and validation [15, 171) establishes a binary relationship between two by-products of the softwarede-velopment process. For this article, we consider testing as the comparison between by-products by each life-cycle phase and implementation. We use a technique called " lifeiycle mutation testing " (LCMT) for a comparative evaluation of testing methods on an expert system. Table I briefly explains each of the testing methods that are considered in this articlr [9, 15-17, 19, 231. Here we comment on characteristics of faults in each life-cycle phase. We make the following general observations about the nature of faults: l Faults not detected in early phases become progressively more expensive to rectify in later phases. l A fault-prone region (input region that results in failures) may not be uniformly distributed across the complete input space. l Faults in early lifwycle phases may induce a multitude of faults in the final program. l Inconsisrenc) or incompleteness in any phase may result in inconsistent or missing rules in the final program. RobIan specification faults. Problem specification is a description of the problem being solved. Black-lxx testing methods have been shown to be effective in identifying faults for pry grams with large fault-prone regions [23]. Therefore, random testing should be effective in detecting prob lem-specification fault. However, random testing is effective when a fault-prone region is uniformly distributed across the complete input space as opposed to a nonuniform distribution [14] (see Figure I). Partition-testing methods are effective when a program has a " onuni-form fault-prone region. Performance of input and output partition-testing methods depends on the partition oitia. Partition testing methods will perform well if the partition criteria used …