Integrated analysis of gene expression and genome-wide DNA methylation for tumor prediction: An association rule mining-based approach

Statistical analysis and association rule mining are two most efficient techniques, where the first one is used to identify differentially expressed/methylated genes across different types of samples or experimental conditions and the second one is used to determine expression/methylation relationships among them. In this article, we have performed an integrated analysis of statistical methods and association rule mining on mRNA expression and DNA methylation datasets for the prediction of Uterine Leiomyoma. Moreover, we have proposed a novel rule-base classifier. Depending on 16 different rule-interestingness measures, we have applied a Genetic Algorithm based rank aggregation technique on the association rules which are generated from the training data by Apriori association rule mining algorithm. After determining the ranks of the rules, we have conducted a majority voting technique on each test point to determine its class-label (i.e. tumor or normal class-label) through weighted-sum method. We have run this classifier on the combined dataset using k-fold cross-validation and also performed a comparative performance analysis with other popular rule-base classifiers. Finally, we have predicted the status of some important genes (through frequency analysis in association rules for tumor and normal class-labels individually) that have a major role for tumor formation in Uterine Leiomyoma.

[1]  Ujjwal Maulik,et al.  A Novel Biclustering Approach to Association Rule Mining for Predicting HIV-1–Human Protein Interactions , 2012, PloS one.

[2]  M. Payson,et al.  Strategy for elucidating differentially expressed genes in leiomyomata identified by microarray technology. , 2003, Fertility and sterility.

[3]  Jae Won Lee,et al.  Comparison of various statistical methods for identifying differential gene expression in replicated microarray data , 2006, Statistical methods in medical research.

[4]  Anil K. Bera,et al.  A test for normality of observations and regression residuals , 1987 .

[5]  Ujjwal Maulik,et al.  Mining association rules from HIV-human protein interactions , 2010, 2010 International Conference on Systems in Medicine and Biology.

[6]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[7]  Richard J. Fox,et al.  A two-sample Bayesian t-test for microarray data , 2006, BMC Bioinformatics.

[8]  Sylvain Forêt,et al.  Asymptotic behaviour and optimal word size for exact and approximate word matches between random sequences , 2006, BMC Bioinformatics.

[9]  Ujjwal Maulik,et al.  On Biclustering of Gene Expression Data , 2010 .

[10]  José María Carazo,et al.  Integrated analysis of gene expression by association rules discovery , 2006, BMC Bioinformatics.

[11]  M. Anandhavalli,et al.  Interestingness Measure for Mining Spatial Gene Expression Data using Association Rule , 2010, ArXiv.

[12]  Ujjwal Maulik,et al.  Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification , 2010, PloS one.

[13]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[14]  Jian-Jun Wei,et al.  Genome-Wide DNA Methylation Indicates Silencing of Tumor Suppressor Genes in Uterine Leiomyoma , 2012, PloS one.

[15]  Yong Xu,et al.  Neuro-Fuzzy Ensemble Approach for Microarray Cancer Gene Expression Data Analysis , 2006, 2006 International Symposium on Evolving Fuzzy Systems.

[16]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[17]  Andrew J Vickers,et al.  Parametric versus non-parametric statistics in the analysis of randomized trials with non-normally distributed data , 2005, BMC medical research methodology.

[18]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[19]  Yudi Pawitan,et al.  False discovery rate, sensitivity and sample size for microarray studies , 2005, Bioinform..

[20]  Paola Todeschini,et al.  Uterine and ovarian carcinosarcomas overexpressing Trop-2 are sensitive to hRS7, a humanized anti-Trop-2 antibody , 2011, Journal of experimental & clinical cancer research : CR.

[21]  Vasyl Pihur,et al.  RankAggreg, an R package for weighted rank aggregation , 2009, BMC Bioinformatics.

[22]  N. Chegini,et al.  Gene Expression Profiling of Leiomyoma and Myometrial Smooth Muscle Cells in Response to Transforming Growth Factor-β , 2005 .

[23]  Shingo Mabu,et al.  Analysis of Various Interestingness Measures in Class Association Rule Mining , 2011 .