Comparing association rules and decision trees for disease prediction

Association rules represent a promising technique to find hidden patterns in a medical data set. The main issue about mining association rules in a medical data set is the large number of rules that are discovered, most of which are irrelevant. Such number of rules makes search slow and interpretation by the domain expert difficult. In this work, search constraints are introduced to find only medically significant association rules and make search more efficient. In medical terms, association rules relate heart perfusion measurements and patient risk factors to the degree of stenosis in four specific arteries. Association rule medical significance is evaluated with the usual support and confidence metrics, but also lift. Association rules are compared to predictive rules mined with decision trees, a well-known machine learning technique. Decision trees are shown to be not as adequate for artery disease prediction as association rules. Experiments show decision trees tend to find few simple rules, most rules have somewhat low reliability, most attribute splits are different from medically common splits, and most rules refer to very small sets of patients. In contrast, association rules generally include simpler predictive rules, they work well with user-binned attributes, rule reliability is higher and rules generally refer to larger sets of patients.

[1]  C. Ordonez,et al.  Constraining and summarizing association rules in medical data , 2006 .

[2]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[3]  John F. Roddick,et al.  Exploratory medical knowledge discovery: experiences and issues , 2003, SKDD.

[4]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[5]  Marzena Kryszkiewicz Concise representation of frequent patterns based on disjunction-free generators , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Kenji Satou,et al.  Extraction of knowledge on protein-protein interaction by association rule discovery , 2002, Bioinform..

[7]  Daniel Sánchez,et al.  Mining association rules with improved semantics in medical databases , 2001, Artif. Intell. Medicine.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[10]  William J. Long,et al.  Research Paper: Evaluation of a Cardiac Diagnostic Program in a Typical Clinical Setting , 2003, J. Am. Medical Informatics Assoc..

[11]  J. Hardin,et al.  Association rules and data mining in hospital infection control and public health surveillance. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[12]  Gerd Stumme,et al.  Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets , 2000, Computational Logic.

[13]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[14]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[15]  Nicolas Pasquier,et al.  Mining Bases for Association Rules Using Closed Sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[16]  A. Sprague,et al.  A Data Mining System for Infection Control Surveillance , 2000, Methods of Information in Medicine.

[17]  William J. Long,et al.  Reasoning requirements for diagnosis of heart disease , 1997, Artif. Intell. Medicine.

[18]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[19]  Alex Alves Freitas,et al.  Understanding the crucial differences between classification and discovery of association rules: a position paper , 2000, SKDD.

[20]  U. Fayyad Knowledge Discovery and Data Mining: An Overview , 1995 .

[21]  Marzena Kryszkiewicz Reducing borders of k-disjunction free representations of frequent patterns , 2004, SAC '04.

[22]  Tzeng-Ji Chen,et al.  Application of a data-mining technique to analyze coprescription patterns for antacids in Taiwan. , 2003, Clinical therapeutics.

[23]  William J. Long Medical Diagnosis using a probabilistic causal network , 1989, Appl. Artif. Intell..

[24]  Viet Phan Luong The Representative Basis for Association Rules , 2001, ICDM.

[25]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[26]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[27]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[28]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[29]  Christophe Rigotti,et al.  DBC: a condensed representation of frequent patterns for efficient mining , 2003, Inf. Syst..

[30]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[31]  Norberto F. Ezquerra,et al.  Mining constrained association rules to predict heart disease , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[32]  Stephen M. Downs,et al.  Mining association rules from a pediatric primary care decision support system , 2000, AMIA.

[33]  Ke Wang,et al.  Pushing Support Constraints Into Association Rules Mining , 2003, IEEE Trans. Knowl. Data Eng..

[34]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.