Informed Selection of Training Examples for Knowledge Refinement

Knowledge refinement tools rely on a representative set of training examples to identify and repair faults in a knowledge based system (KBS). In real environments it is often difficult to obtain a large set of examples since each problem-solving task must be labelled with the expert's solution. However, it is often somewhat easier to generate unlabelled tasks that cover the expertise of a KBS. This paper investigates ways to select a suitable sample from a set of unlabelled problem-solving tasks, so that only the subset requires to be labelled. The unlabelled examples are clustered according to the way they are solved by the KBS and selection is targeted on these clusters. Experiments in two domains showed that selective sampling reduced the number of training examples used for refinement, and hence requiring to be labelled. Moreover, this reduction was possible without affecting the accuracy of the final refined KBS. A single example selected randomly from each cluster was effective in one domain, but the other required a more informed selection that takes account of potentially conflicting repairs.

[1]  Susan Craw,et al.  Organising Knowledge Refinement Operators , 1999, EUROVAV.

[2]  Raymond J. Mooney,et al.  Automated refinement of first-order horn-clause domain theories , 2005, Machine Learning.

[3]  Shlomo Argamon,et al.  Committee-Based Sample Selection for Probabilistic Classifiers , 1999, J. Artif. Intell. Res..

[4]  Susan Craw,et al.  Representing Problem-Solving for Knowledge Refinement , 1999, AAAI/IAAI.

[5]  Stephen Jose Hanson,et al.  CONCEPTUAL CLUSTERING AND CATEGORIZATION , 1990 .

[6]  Raymond J. Mooney,et al.  Batch versus Incremental Theory Refinement , 1992 .

[7]  Alun Preece,et al.  State of the art in automated validation of knowledge-based systems☆ , 1994 .

[8]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[9]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[10]  Susan Craw,et al.  Knowledge Refinement for a Design System , 1997, EKAW.

[11]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[12]  Yolanda Gil,et al.  Designing Scripts to Guide Users in Modifying Knowledge-based Systems , 1999, AAAI/IAAI.

[13]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[18]  Susan Craw,et al.  Sequencing Training Examples for Iterative Knowledge Refinement , 2000 .

[19]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.