Optimization of the AUC Criterion for Rule Subset Selection

The area under the ROC curve (AUC) is considered a relevant criterion to deal with imbalanced data, misclassification costs and noisy data. Based on this preference, we present an algorithm for rule subset selection. The algorithm builds a Pareto Front using the Sensitivity and Specificity criteria selecting rules from a large set of rules. An empirical study is carried out to verify the influence of the A priori Parameter in Pareto Front Elite Algorithm. We compare our results with other rule induction algorithms and the results show that the new algorithm obtains a set of rules with greater values of the AUC.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[3]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[4]  Mauricio G. C. Resende,et al.  A Greedy Randomized Adaptive Search Procedure for Maximum Independent Set , 1994, Oper. Res..

[5]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Peter A. Flach,et al.  ROCCER: An Algorithm for Rule Learning Based on ROC Analysis , 2005, IJCAI.

[9]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[10]  Nada Lavrac,et al.  Classification Rule Learning with APRIORI-C , 2001, EPIA.

[11]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[12]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[13]  Gustavo E. A. P. A. Batista,et al.  A Comparison of Methods for Rule Subset Selection Applied to Associative Classification , 2006 .

[15]  Alain Rakotomamonjy,et al.  Optimizing Area Under Roc Curve with SVMs , 2004, ROCAI.

[16]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[17]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[18]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[19]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[20]  Michèle Sebag,et al.  ROC-Based Evolutionary Learning: Application to Medical Data Mining , 2003, Artificial Evolution.

[21]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[22]  Peter A. Flach,et al.  Learning Decision Trees Using the Area Under the ROC Curve , 2002, ICML.

[23]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[24]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[25]  Michèle Sebag,et al.  Impact studies and sensitivity analysis in medical data mining with ROC-based genetic learning , 2003, Third IEEE International Conference on Data Mining.

[26]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[27]  Thomas Bäck,et al.  An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.

[28]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..