Adequacy of Limited Testing for Knowledge Based Systems

Knowledge-based engineering and computational intelligence are expected to become core technologies in the design and manufacturing for the next generation of space exploration missions. The literature is contradictory on how we are to assess such systems. Studies indicate significant disagreement regarding the amount of testing needed for system assessment. The sizes of standard black-box test suites are impractically large since the black-box approach neglects the internal structure of knowledge-based systems. On the contrary, practical results repeatedly indicate that only a few tests are needed to sample the range of behaviors of a knowledge-based program. In this paper, we model testing as a search process over the internal state space of the knowledge-based system. When comparing different test suites, the test suite that examines larger portion of the state space is considered more complete. Our goal is to investigate the trade-off between the completeness criterion and the size of test suites. The results of testing experiment on tens of thousands of mutants of real-world knowledge based systems indicate that a very limited gain in completeness can be achieved through prolonged testing. The use of simple (or random) search strategies for testing appears to be as powerful as testing by more thorough search algorithms.

[1]  Farokh B. Bastani,et al.  A Software Reliability Model for Artificial Intelligence Programs , 1993, Int. J. Softw. Eng. Knowl. Eng..

[2]  Farokh B. Bastani,et al.  Assessment of the Reliability of AI Programs , 1993, Int. J. Artif. Intell. Tools.

[3]  Juan Pedro Caraça-Valente,et al.  Knowledge-based systems' validation: when to stop running test cases , 1999, Int. J. Hum. Comput. Stud..

[4]  Luca Console,et al.  Readings in Model-Based Diagnosis , 1992 .

[5]  Alfred V. Aho,et al.  The awk programming language , 1988 .

[6]  Victor R. Basili,et al.  An Evaluation of Expert Systems for Software Engineering Management , 1989, IEEE Trans. Software Eng..

[7]  Donald E. Knuth A torture test for TEX , 1984 .

[8]  Johan de Kleer,et al.  An Assumption-Based TMS , 1987, Artif. Intell..

[9]  Tim Menzies Evaluation Issues with Critical Success Metrics , 1998 .

[10]  Alun D. Preece,et al.  Principles and practice in verifying rule-based systems , 1992, Knowl. Eng. Rev..

[11]  Tim Menzies,et al.  Critical success metrics: evaluation at the business level , 1999, Int. J. Hum. Comput. Stud..

[12]  Dean Allemang,et al.  The Computational Complexity of Abduction , 1991, Artif. Intell..

[13]  Toby Walsh,et al.  Scaling Effects in the CSP Phase Transition , 1995, CP.

[14]  Tim Menzies,et al.  Applications of abduction: hypothesis testing of neuroendocrinological qualitative compartmental models , 1997, Artif. Intell. Medicine.

[15]  Hector J. Levesque,et al.  A New Method for Solving Hard Satisfiability Problems , 1992, AAAI.

[16]  James M. Crawford,et al.  Experimental Results on the Application of Satisfiability Algorithms to Scheduling Problems , 1994, AAAI.

[17]  Richard G. Hamlet,et al.  Partition Testing Does Not Inspire Confidence , 1990, IEEE Trans. Software Eng..

[18]  Lawrence M. Fagan,et al.  Antimicrobial selection by a computer. A blinded evaluation by infectious diseases experts. , 1979, JAMA.

[19]  Alun Preece,et al.  State of the art in automated validation of knowledge-based systems☆ , 1994 .

[20]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[21]  Martin E. Dyer,et al.  Locating the Phase Transition in Binary Constraint Satisfaction Problems , 1996, Artif. Intell..

[22]  Robert M. O'Keefe,et al.  Knowledge Base Verification , 1997 .

[23]  Ing-Ray Chen,et al.  Effect of Parallel Planning on System Reliability of Real-Time Expert , 1997 .

[24]  Elaine J. Weyuker,et al.  Reliability Testing of Rule-Based Systems , 1996, IEEE Softw..

[25]  Aditya P. Mathur,et al.  Software testing and reliability , 1996 .

[26]  Paul Harmon,et al.  Expert systems: artificial intelligence in business , 1985 .

[27]  Alun D. Preece,et al.  Verifying Knowledge Bases by Anomaly Detection: An Experience Report , 1992, ECAI.

[28]  A. T. Bahill,et al.  How the testing techniques for a decision support system changed over eight years , 1993, Proceedings of IEEE Systems Man and Cybernetics Conference - SMC.

[29]  T. Menzies Principles for generalised testing of knowledge bases , 1996 .

[30]  Tim Menzies,et al.  Evaluating a Qualitative Reasoner , 1997, Australian Joint Conference on Artificial Intelligence.

[31]  James M. Bieman,et al.  An empirical evaluation (and specification) of the all-du-paths testing criterion , 1992, Softw. Eng. J..

[32]  P. Pandurang Nayak,et al.  A Model-Based Approach to Reactive Self-Configuring Systems , 1996, AAAI/IAAI, Vol. 2.

[33]  G. Betta,et al.  A knowledge-based approach to instrument fault detection and isolation , 1995 .

[34]  Ahmed K. Noor,et al.  A NEW FRONTIER IN ENGINEERING , 1998 .

[35]  Peter C. Cheeseman,et al.  Where the Really Hard Problems Are , 1991, IJCAI.

[36]  Ing-Ray Chen,et al.  A reliability model for real-time rule-based expert systems , 1995 .

[37]  Tim Menzies,et al.  On the Practicality of Viewpoint-Based Requirements Engineering , 1998, PRICAI.

[38]  A. Terry Bahill,et al.  How the testing techniques for a decision support system changed over nine years , 1995, IEEE Transactions on Systems, Man, and Cybernetics.

[39]  Daniel G. Bobrow,et al.  Expert systems: perils and promise , 1986, CACM.

[40]  Frederick Hayes-Roth,et al.  Building expert systems , 1983, Advanced book program.

[41]  Tim Menzies,et al.  Fewer Slices of PIE: Optimising Mutation Testing via Abduction , 1999 .

[42]  Robert M. Colomb,et al.  Representation of Propositional Expert Systems as Partial Functions , 1999, Artif. Intell..

[43]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .

[44]  Bojan Cukic,et al.  Smaller, Faster Dialogues via Conversational Probing , 1999 .