Association Rule Discovery Has the Ability to Model Complex Genetic Effects

Dramatic advances in genotyping technology have established a need for fast, flexible analysis methods for genetic association studies. Common complex diseases, such as Parkinson's disease or multiple sclerosis, are thought to involve an interplay of multiple genes working either independently or together to influence disease risk. Also, multiple underlying traits, each its own genetic basis may be defined together as a single disease. These effects - trait heterogeneity, locus heterogeneity, and gene-gene interactions (epistasis) - contribute to the complex architecture of common genetic diseases. Association rule discovery (ARD) searches for frequent itemsets to identify rule-based patterns in large scale data. In this study, we apply Apriori (an ARD algorithm) to simulated genetic data with varying degrees of complexity. Apriori using information difference to prior as a rule measure shows good power to detect functional effects in simulated cases of simple trait heterogeneity, trait heterogeneity and epistasis, and moderate power in cases of trait heterogeneity and locus heterogeneity. Also, we illustrate that bootstrapping the rule induction process does not considerably improve the power to detect these effects. These results show that ARD is a framework with sufficient flexibility to characterize complex genetic effects

[1]  S. Narod,et al.  The impact of family history on early detection of prostate cancer , 1995, Nature Medicine.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  C Kooperberg,et al.  Sequence Analysis Using Logic Regression , 2001, Genetic epidemiology.

[4]  José María Carazo,et al.  BMC Bioinformatics BioMed Central Methodology article Integrated analysis of gene expression by association rules discovery , 2022 .

[5]  Jürg Ott,et al.  Set Association Analysis of SNP Case-Control and Microarray Data , 2003, J. Comput. Biol..

[6]  J. Ott,et al.  Strategies for characterizing highly polymorphic markers in human gene mapping. , 1992, American journal of human genetics.

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[9]  D. Tregouet,et al.  Automated detection of informative combined effects in genetic association studies of complex traits. , 2003, Genome research.

[10]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[11]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[12]  N. Schork,et al.  Who's afraid of epistasis? , 1996, Nature Genetics.

[13]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[14]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Douglas H. Fisher,et al.  Bootstrapping rule induction to achieve rule stability and reduction , 2006, Journal of Intelligent Information Systems.

[17]  Jason H. Moore,et al.  Dissecting trait heterogeneity: a comparison of three clustering methods applied to genotypic data , 2006, BMC Bioinformatics.

[18]  Christian Borgelt,et al.  Induction of Association Rules: Apriori Implementation , 2002, COMPSTAT.

[19]  Chengqi Zhang,et al.  Association Rule Mining , 2002, Lecture Notes in Computer Science.

[20]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[21]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[22]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[23]  M. Reilly,et al.  MDR and PRP: A Comparison of Methods for High-Order Genotype-Phenotype Associations , 2005, Human Heredity.

[24]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .