论文信息 - An evaluation of heuristics for rule ranking

An evaluation of heuristics for rule ranking

OBJECTIVE To evaluate and compare the performance of different rule-ranking algorithms for rule-based classifiers on biomedical datasets. METHODOLOGY Empirical evaluation of five rule ranking algorithms on two biomedical datasets, with performance evaluation based on ROC analysis and 5 × 2 cross-validation. RESULTS On a lung cancer dataset, the area under the ROC curve (AUC) of, on average, 14267.1 rules was 0.862. Multi-rule ranking found 13.3 rules with an AUC of 0.852. Four single-rule ranking algorithms, using the same number of rules, achieved average AUC values of 0.830, 0.823, 0.823, and 0.822, respectively. On a prostate cancer dataset, an average of 339265.3 rules had an AUC of 0.934, while 9.4 rules obtained from multi-rule and single-rule rankings had average AUCs of 0.932, 0.926, 0.925, 0.902 and 0.902, respectively. CONCLUSION Multi-variate rule ranking performs better than the single-rule ranking algorithms. Both single-rule and multi-rule methods are able to substantially reduce the number of rules while keeping classification performance at a level comparable to the full rule set.

[1] Lucila Ohno-Machado,et al. A Comparison of Machine Learning Methods for the Diagnosis of Pigmented Skin Lesions , 2001, J. Biomed. Informatics.

[2] Yishay Mansour,et al. A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization , 1998, ICML.

[3] Padhraic Smyth,et al. An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[4] John R. Anderson,et al. MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[5] J. Ross Quinlan,et al. Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[6] E. Petricoin,et al. Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[7] Mehdi Toloo,et al. A new method for ranking discovered rules from data mining by DEA , 2009, Expert Syst. Appl..

[8] Ryszard S. Michalski,et al. A theory and methodology of inductive learning , 1993 .

[9] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10] Ethem Alpaydın,et al. Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[11] Mu-Chen Chen,et al. Ranking discovered rules from data mining with multiple criteria by data envelopment analysis , 2007, Expert Syst. Appl..

[12] Heikki Mannila,et al. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining , 1997 .

[13] Bernhard Pfeifer,et al. A new rule-based algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry , 2008, Bioinform..

[14] Luís Torgo,et al. Controlled Redundancy in Incremental Rule Learning , 1993, ECML.

[15] Alex Alves Freitas,et al. On rule interestingness measures , 1999, Knowl. Based Syst..

[16] Max Bramer,et al. Using J-Pruning to Reduce Overfitting of Classification Rules in Noisy Domains , 2002, DEXA.

[17] Rajjan Shinghal,et al. Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[18] William Frawley,et al. Knowledge Discovery in Databases , 1991 .

[19] Stan Matwin,et al. Measuring Quality of Concept Descriptions , 1988, EWSL.

[20] Heikki Mannila,et al. Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[21] Wynne Hsu,et al. Pruning and summarizing the discovered associations , 1999, KDD '99.

[22] Brian R. Gaines,et al. Current Trends in Knowledge Acquisition , 1990 .

[23] Lakhmi C. Jain,et al. Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[24] Lucila Ohno-Machado,et al. Small, fuzzy and interpretable gene expression based classifiers , 2005, Bioinform..

[25] Jorma Rissanen,et al. MDL-Based Decision Tree Pruning , 1995, KDD.

[26] Gregory Piatetsky-Shapiro,et al. Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[27] P. Sebastiani,et al. Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.

[28] Patrick Meyer,et al. On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid , 2008, Eur. J. Oper. Res..

[29] Osmar R. Zaïane,et al. On Pruning and Tuning Rules for Associative Classifiers , 2005, KES.

[30] Wynne Hsu,et al. Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.