Understanding what machine learning produces - Part I: Representations and their comprehensibility

The aim of many machine learning users is to comprehend the structures that are inferred from a dataset, and such users may be far more interested in understanding the structure of their data than in predicting the outcome of new test data. Part I of this paper surveys representations based on decision trees, production rules and decision graphs, that have been developed and used for machine learning. These representations have differing degrees of expressive power, and particular attention is paid to their comprehensibility for nonspecialist users. The graphic form in which a structure is portrayed also has a strong effect on comprehensibility, and Part II of this paper develops knowledge visualization techniques that are particularly appropriate to help answer the questions that machine learning users typically ask about the structures produced. The result of machine learning can be evaluated from two quite different points of view: how the knowledge that is acquired performs in new situations; and how well users comprehend the explicit descriptions of knowledge that are generated. In the literature, machine learning schemes are usually assessed on their performance alone. Techniques such as evaluation on test sets and cross-validation are specifically designed to measure performance. Methods based on minimum description length, which are often used to control pruning, do have some connection with comprehensibility in that succinct descriptions are—other things being equal—easier to understand than lengthy ones. However, they are generally employed to compare alternative descriptions couched in the same terms, rather than being used across representations. People who apply machine learning in practice frequently discover that their clients adopt the second perspective. What interests them most is not classification performance on new examples, but the perspicuity of the knowledge that is derived from the data and the light that this sheds on the original problem. Representation is of crucial concern to these users: they want to understand the knowledge structures and relate them to the data from whence they came. Perspicuity of knowledge and knowledge representations is inherently subjective and impossible to treat quantitatively. Moreover, one cannot adopt the standard method of comparing machine learning algorithms by testing them on benchmark data sets, because only users of the data can judge perspicuity, and they are not available. No doubt for these reasons, researchers shy away from the second kind of evaluation. Nevertheless, the importance of user understanding is often of overriding concern in practice, and comparisons should not be based on the wrong thing just because it is easy to measure. The purpose of this paper is to begin to redress the balance. We define “learning” as the acquisition of structural descriptions from examples, descriptions that can either be used as a basis for performance or studied in their own right—and it is the latter that interests us. In this paper we restrict attention to classification learners, as they are generally the system of choice when the user’s aim is to understand an empirical data set through explicit knowledge representation. The first part of this paper surveys knowledge representations that are used in classification systems. Whereas one often thinks in terms of trees versus rules, this is an over-simplistic

[1]  Jonathan J. Oliver Decision Graphs - An Extension of Decision Trees , 1993 .

[2]  Alberto L. Sangiovanni-Vincentelli,et al.  Inferring Reduced Ordered Decision Graphs of Minimum Description Length , 1995, ICML.

[3]  Ron Kohavi,et al.  Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitations , 1994, AAAI.

[4]  Brian R. Gaines Structured and Unstructured Induction with EDAGs , 1995, KDD.

[5]  Vincent C ORRUBLE,et al.  Comprehensible exploratory induction with decision graphs , 1995 .

[6]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[7]  Tim Niblett,et al.  Constructing Decision Trees in Noisy Domains , 1987, EWSL.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Ron Kohavi,et al.  Oblivious Decision Trees, Graphs, and Top-Down Pruning , 1995, IJCAI.

[10]  Ron Rymon An SE-tree based Characterization of the Induction Problem , 1993, ICML.

[11]  David Newman,et al.  Framework for a Generic Knowledge Discovery Toolkit , 1995, AISTATS.

[12]  Ron Rymon,et al.  Automatic cataloguing and characterization of Earth science data using SE-trees , 1994 .

[13]  P. Compton,et al.  A philosophical basis for knowledge acquisition , 1990 .

[14]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[15]  John Mingers,et al.  Expert Systems—Rule Induction with Statistical Data , 1987 .

[16]  Heikki Mannila,et al.  Learning rules with local exceptions , 1994 .

[17]  J. J. Mahoney,et al.  Initializing ID5R with a Domain Theory: Some Negative Results , 1991 .

[18]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[19]  Hussein Almuallim,et al.  On Handling Tree-Structured Attributed in Decision Tree Learning , 1995, ICML.

[20]  Arno Siebes,et al.  Data Mining: the search for knowledge in databases. , 1994 .