Toward AI research methodology: three case studies in evaluation

The roles of evaluation in empirical artificial intelligence (AI) research are described, in an idealized cyclic model and in the context of three case studies. The case studies illustrate the pitfalls in evaluation and the contributions of evaluation at all stages of the research cycle. Evaluation methods are contrasted with those of the behavioral sciences, and it is concluded that AI must define and refine its own methods. To this end, several experiment schemas and many specific evaluation criteria are described. Recommendations are offered in the hope of encouraging the development and practice of evaluation methods in AI. The first case study illustrates problems with evaluating knowledge-based systems, specifically a portfolio management expert system called FOLIO. The second study focuses on the relationship between evaluation and the evolution of the GRANT system, specifically, how the evaluations changed as GRANT's knowledge base was sealed up. Third, the cyclic nature of a given research model is examined. >