How evaluation guides AI research

Evaluation should be a mechanism of progress both within and across AI research projects. For the individual, evaluation can tell us how and why our methods and programs work and, so, tell us how our research should proceed. For the community, evaluation expedites the understanding of available methods and, so, their integration into further research. In this article, we present a five-stage model of AI research and describe guidelines for evaluation that are appropriate for each stage. These guidelines, in the form of evaluation criteria and techniques, suggest how to perform evaluation. We conclude with a set of recommendations that suggest how to encourage the evaluation of AI research.

[1]  Douglas B. Lenat,et al.  On the thresholds of knowledge , 1987, Proceedings of the International Workshop on Artificial Intelligence for Industrial Applications.

[2]  Douglas B. Lenat,et al.  AM, an artificial intelligence approach to discovery in mathematics as heuristic search , 1976 .

[3]  Edward H. Shortliffe,et al.  Computer-based medical consultations, MYCIN , 1976 .

[4]  Bruce G. Buchanan,et al.  Artificial Intelligence as an Experimental Science , 1988 .

[5]  Allen Newell,et al.  Computer science as empirical inquiry: symbols and search (1976) , 1989 .

[6]  J. Rothenberg Evaluating expert system tools: a framework and methodology , 1987 .

[7]  Allen Newell,et al.  Computer science as empirical inquiry: symbols and search , 1976, CACM.

[8]  Philip Klahr,et al.  Evaluation of expert systems: issues and case studies , 1983 .

[9]  Paul R. Cohen,et al.  Dominic: A Domain-Independent Program for Mechanical Engineering Design , 1986, Artif. Intell. Eng..

[10]  Philip E. Agre,et al.  ABSTRACT REASONING AS EMERGENT FROM CONCRETE ACTIVITY , 1987 .

[11]  P. Agre The Structures of Everyday Life , 1985 .

[12]  Edmund H. Durfee,et al.  Approximate Processing in Real-Time Problem Solving , 1988, AI Mag..

[13]  Douglas B. Lenat,et al.  Why AM and EURISKO Appear to Work , 1984, Artif. Intell..

[14]  Victor R. Lesser,et al.  The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty , 1980, CSUR.

[15]  Bruce G. Buchanan,et al.  The MYCIN Experiments of the Stanford Heuristic Programming Project , 1985 .

[16]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[17]  Johan de Kleer,et al.  An Assumption-Based TMS , 1987, Artif. Intell..

[18]  Edmund H. Durfee,et al.  Evaluating Research in Cooperative Distributed Problem Solving , 1990, Distributed Artificial Intelligence.

[19]  Douglas B. Lenat,et al.  CYC: Using Common Sense Knowledge to Overcome Brittleness and Knowledge Acquisition Bottlenecks , 1986, AI Mag..

[20]  Paul R. Cohen,et al.  A Report on FOLIO: An Expert Assistant for Portfolio Managers , 1983, IJCAI.

[21]  Pat Langley,et al.  Research papers in machine learning , 2004, Machine Learning.

[22]  Kenneth D. Forbus,et al.  Focusing the ATMS , 1988, AAAI.