MFMS: maximal frequent module set mining from multiple human gene expression data sets

Advances in genomic technologies have allowed vast amounts of gene expression data to be collected. Protein functional annotation and biological module discovery that are based on a single gene expression data suffers from spurious coexpression. Recent work have focused on integrating multiple independent gene expression data sets. In this paper, we propose a two-step approach for mining maximally frequent collection of highly connected modules from coexpression graphs. We first mine maximal frequent edge-sets and then extract highly connected subgraphs from the edge-induced subgraphs. Experimental results on the collection of modules mined from 52 Human gene expression data sets show that coexpression links that occur together in a significant number of experiments have a modular topological structure. Moreover, GO enrichment analysis shows that the proposed approach discovers biologically significant frequent collections of modules.

[1]  T. Vicsek,et al.  Clique percolation in random networks. , 2005, Physical review letters.

[2]  T. M. Murali,et al.  Reverse Engineering Molecular Hypergraphs , 2013, TCBB.

[3]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[4]  Jian Pei,et al.  Mining cross-graph quasi-cliques in gene expression and protein interaction data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5]  Jiawei Han,et al.  Mining closed relational graphs with connectivity constraints , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[7]  Jean-François Boulicaut,et al.  Constraint-Based Mining of Sets of Cliques Sharing Vertex Properties , 2010 .

[8]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[9]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[10]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[11]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[12]  LiHaifeng,et al.  Systematic discovery of functional modules and context-specific functional annotation of human genome , 2007 .

[13]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[14]  Jian Pei,et al.  Mining frequent cross-graph quasi-cliques , 2009, TKDD.

[15]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[16]  Haifeng Li,et al.  Systematic discovery of functional modules and context-specific functional annotation of human genome , 2007, ISMB/ECCB.