Using rule sets to maximize ROC performance

Rules are commonly used for classification because they are modular intelligible and easy to learn. Existing work in classification rule learning assumes the goal is to produce categorical classifications to maximize classification accuracy. Recent work in machine learning has pointed out the limitations of classification accuracy: when class distributions are skewed or error costs are unequal, an accuracy-maximizing rule set can perform poorly. A more flexible use of a rule set is to produce instance scores indicating the likelihood that an instance belongs to a given class. With such an ability, we can apply rule sets effectively when distributions are skewed or error costs are unequal. This paper empirically investigates different strategies for evaluating rule sets when the goal is to maximize the scoring (ROC) performance.

[1]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[2]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[3]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[4]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[5]  Foster J. Provost,et al.  RL4: a tool for knowledge-based induction , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  David C. Wilkins,et al.  The Refinement of Probabilistic Rule Sets: Sociopathic Interactions , 1994, Artif. Intell..

[8]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[9]  Sholom M. Weiss,et al.  Rule-based Machine Learning Methods for Functional Prediction , 1995, J. Artif. Intell. Res..

[10]  Alberto Del Bimbo,et al.  Recurrent neural networks can be trained to be maximum a posteriori probability classifiers , 1995, Neural Networks.

[11]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[12]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[13]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[16]  øöö Blockinøø Well-Trained PETs : Improving Probability Estimation , 2000 .

[17]  J A Swets,et al.  Better decisions through science. , 2000, Scientific American.

[18]  Alex Alves Freitas,et al.  Understanding the crucial differences between classification and discovery of association rules: a position paper , 2000, SKDD.

[19]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.