Comparative Performance of Rule Quality Measures in an Induction System

This paper addresses an important problem related to the use ofinduction systems in analyzing real world data. The problem is thequality and reliability of the rules generated by the systems.~Wediscuss the significance of having a reliable and efficient rule quality measure. Such a measure can provide useful support ininterpreting, ranking and applying the rules generated by aninduction system. A number of rule quality and statistical measuresare selected from the literature and their performance is evaluatedon four sets of semiconductor data. The primary goal of thistesting and evaluation has been to investigate the performance ofthese quality measures based on: (i) accuracy, (ii) coverage, (iii)positive error ratio, and (iv) negative error ratio of the ruleselected by each measure. Moreover, the sensitivity of these qualitymeasures to different data distributions is examined. Inconclusion, we recommend Cohen‘s statistic as being the best qualitymeasure examined for the domain. Finally, we explain some future workto be done in this area.

[1]  Peter D. Turney,et al.  Intelligently helping the human planner in industrial process planning , 1991, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[2]  Brian R. Gaines,et al.  Current Trends in Knowledge Acquisition , 1990 .

[3]  Stan Matwin,et al.  Using Qualitative Models to Guide Inductive Learning , 1993, ICML.

[4]  Ivan Bruha,et al.  Quality of Decision Rules: Empirical and Statistical Approaches , 1993, Informatica.

[5]  Oren Etzioni,et al.  Representation design and brute-force induction in a Boeing manufacturing domain , 1994, Appl. Artif. Intell..

[6]  John Canning A Minimum Description Length Model for Recognizing Objects with Variable Appearances (The VAPOR model) , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  J. Ross Quinlan,et al.  The Minimum Description Length Principle and Categorical Theories , 1994, ICML.

[8]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[9]  Allen H. Reed,et al.  The Statistical Analysis of Data , 1986 .

[10]  A. Fazel Famili The Role of Data Pre-processing in Intelligent Data Analysis , 1995 .

[11]  Ryszard S. Michalski,et al.  The AQ15 Inductive Learning System: An Overview and Experiments , 1986 .

[12]  Luís Torgo,et al.  Rule Combination in Inductive Learning , 1993, ECML.

[13]  A. Famili,et al.  Use of decision-tree induction for process optimization and knowledge refinement of an industrial process , 1994, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[14]  Stan Matwin,et al.  Measuring Quality of Concept Descriptions , 1988, EWSL.

[15]  A. J. Westlake Introduction to Linear Models and the Design and Analysis of Experiments , 1969 .

[16]  Luís Torgo,et al.  Controlled Redundancy in Incremental Rule Learning , 1993, ECML.

[17]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[18]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[19]  S. Weiss,et al.  Predicting defects in disk drive manufacturing: A case study in high-dimensional classification , 1993, Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications.

[20]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[21]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[22]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[23]  Shogo Nishida,et al.  Learning to Learn Decision Trees , 1992, AAAI.

[24]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[25]  William A. Wallace,et al.  Induction of Rules Subject to a Quality Constraint: Probabilistic Inductive Learning , 1993, IEEE Trans. Knowl. Data Eng..

[26]  J. R. Quinlan,et al.  MDL and Categorical Theories (Continued) , 1995, ICML.

[27]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..