GenMiner: Mining Informative Association Rules from Genomic Data

GENMINER is a smart adaptation of closed itemsets based association rules extraction to genomic data. It takes advantage of the novel NORDI discretization method and of the CLOSE [27] algorithm to efficiently generate min- imal non-redundant association rules. GENMINER facili- tates the integration of numerous sources of biological in- formation such as gene expressions and annotations, and can tacitly integrate qualitative information on biological conditions (age, sex, etc.). We validated this approach ana- lyzing the microarray datasets used by Eisen et al. [10] with several sources of biological annotations. Extracted asso- ciations revealed significant co-annotated and co-expressed gene patterns, showing important biological relationships between genes and their features. Several of these relation- ships are supported by recent biological literature.

[1]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[2]  José María Carazo,et al.  Integrated analysis of gene expression by association rules discovery , 2006, BMC Bioinformatics.

[3]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[4]  Ricardo Martínez,et al.  Extracted Knowledge Interpretation in mining biological data: a survey , 2007, RCIS.

[5]  Nicola J. Rinaldi,et al.  Transcriptional Regulatory Networks in Saccharomyces cerevisiae , 2002, Science.

[6]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[7]  Ricardo Martínez,et al.  Co-expressed gene groups analysis (CGGA): An automatic tool for the interpretation of microarray experiments , 2006 .

[8]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[9]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[10]  B. Dyson,et al.  Running head: , 2019 .

[11]  Gediminas Adomavicius,et al.  Handling very large numbers of association rules in the analysis of microarray data , 2002, KDD.

[12]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[13]  David Shore,et al.  Fine-Structure Analysis of Ribosomal Protein Gene Transcription , 2006, Molecular and Cellular Biology.

[14]  Jiong Yang,et al.  Gene ontology friendly biclustering of expression profiles , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[15]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[16]  Claude Pasquier,et al.  THEA: ontology-driven analysis of microarray data , 2004, Bioinform..

[17]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[18]  Dan A. Simovici,et al.  Generating an informative cover for association rules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Jerry Li,et al.  Within the fold: assessing differential expression measures and reproducibility in microarray assays , 2002, Genome Biology.

[21]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[22]  Petri Törönen,et al.  Theme discovery from gene lists for identification and viewing of multiple functional groups , 2005, BMC Bioinformatics.

[23]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[24]  Anil K. Bera,et al.  Efficient tests for normality, homoscedasticity and serial independence of regression residuals: Monte Carlo Evidence , 1981 .

[25]  Kian-Lee Tan,et al.  Mining gene expression data for positive and negative co-regulated gene clusters , 2004, Bioinform..

[26]  Huiming Ding,et al.  The synthetic genetic interaction spectrum of essential genes , 2005, Nature Genetics.

[27]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[28]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[29]  Stanley N Cohen,et al.  Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  R. Altman,et al.  Whole-genome expression analysis: challenges beyond clustering. , 2001, Current opinion in structural biology.

[31]  Stefan Kramer,et al.  Analyzing microarray data using quantitative association rules , 2005, ECCB/JBI.

[32]  Gerd Stumme,et al.  Generating a Condensed Representation for Association Rules , 2005, Journal of Intelligent Information Systems.

[33]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[34]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[35]  Daniel L. Hartl,et al.  GeneMerge - Post-genomic Analysis, Data Mining, and Hypothesis Testing , 2003, Bioinform..

[36]  Philip S. Yu,et al.  A new method to measure the semantic similarity of GO terms , 2007, Bioinform..

[37]  Hagit Shatkay,et al.  Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis , 2000, ISMB.

[38]  R. Morse,et al.  RAP, RAP, open up! New wrinkles for RAP1 in yeast. , 2000, Trends in genetics : TIG.

[39]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.