An evaluation of heuristics for rule ranking

OBJECTIVE To evaluate and compare the performance of different rule-ranking algorithms for rule-based classifiers on biomedical datasets. METHODOLOGY Empirical evaluation of five rule ranking algorithms on two biomedical datasets, with performance evaluation based on ROC analysis and 5 × 2 cross-validation. RESULTS On a lung cancer dataset, the area under the ROC curve (AUC) of, on average, 14267.1 rules was 0.862. Multi-rule ranking found 13.3 rules with an AUC of 0.852. Four single-rule ranking algorithms, using the same number of rules, achieved average AUC values of 0.830, 0.823, 0.823, and 0.822, respectively. On a prostate cancer dataset, an average of 339265.3 rules had an AUC of 0.934, while 9.4 rules obtained from multi-rule and single-rule rankings had average AUCs of 0.932, 0.926, 0.925, 0.902 and 0.902, respectively. CONCLUSION Multi-variate rule ranking performs better than the single-rule ranking algorithms. Both single-rule and multi-rule methods are able to substantially reduce the number of rules while keeping classification performance at a level comparable to the full rule set.

[1]  Lucila Ohno-Machado,et al.  A Comparison of Machine Learning Methods for the Diagnosis of Pigmented Skin Lesions , 2001, J. Biomed. Informatics.

[2]  Yishay Mansour,et al.  A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization , 1998, ICML.

[3]  Padhraic Smyth,et al.  An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[4]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[5]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[6]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[7]  Mehdi Toloo,et al.  A new method for ranking discovered rules from data mining by DEA , 2009, Expert Syst. Appl..

[8]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[11]  Mu-Chen Chen,et al.  Ranking discovered rules from data mining with multiple criteria by data envelopment analysis , 2007, Expert Syst. Appl..

[12]  Heikki Mannila,et al.  Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining , 1997 .

[13]  Bernhard Pfeifer,et al.  A new rule-based algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry , 2008, Bioinform..

[14]  Luís Torgo,et al.  Controlled Redundancy in Incremental Rule Learning , 1993, ECML.

[15]  Alex Alves Freitas,et al.  On rule interestingness measures , 1999, Knowl. Based Syst..

[16]  Max Bramer,et al.  Using J-Pruning to Reduce Overfitting of Classification Rules in Noisy Domains , 2002, DEXA.

[17]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[18]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[19]  Stan Matwin,et al.  Measuring Quality of Concept Descriptions , 1988, EWSL.

[20]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[21]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[22]  Brian R. Gaines,et al.  Current Trends in Knowledge Acquisition , 1990 .

[23]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[24]  Lucila Ohno-Machado,et al.  Small, fuzzy and interpretable gene expression based classifiers , 2005, Bioinform..

[25]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[26]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[27]  P. Sebastiani,et al.  Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.

[28]  Patrick Meyer,et al.  On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid , 2008, Eur. J. Oper. Res..

[29]  Osmar R. Zaïane,et al.  On Pruning and Tuning Rules for Associative Classifiers , 2005, KES.

[30]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.