Mining representative approximate frequent coexpression subnetworks

Advances in high-throughput microarray and RNA-sequencing technologies have lead to a rapid accumulation of gene expression data for various biological conditions across multiple species. Mining frequent gene modules from a set of multiple gene coexpression networks has applications in functional gene annotation and biomarker discovery. Biclustering algorithms have been proposed to allow for missing coexpression links. Existing approaches report a large number of edgesets which are computationally intensive to analyze, and have high overlap among the reported subnetworks. In this work, we propose an algorithm to mine frequent dense modules from multiple coexpression networks using an online data summarization method. Our algorithm mines a succinct set of representative subgraphs that have little overlap which reduces the downstream analysis of the reported modules. Experiments on human gene expression data show that the reported modules are biologically significant as evident by the high enrichment of GO molecular functions and KEGG pathways in the reported modules.

[1]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jiawei Han,et al.  Mining closed relational graphs with connectivity constraints , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.

[5]  Saeed Salem,et al.  Template edge similarity graph clustering for mining multiple gene expression datasets , 2017, Int. J. Data Min. Bioinform..

[6]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[7]  Daphne Koller,et al.  Sharing and Specificity of Co-expression Networks across 35 Human Tissues , 2014, PLoS Comput. Biol..

[8]  Takeaki Uno,et al.  Enumeration of condition-dependent dense modules in protein interaction networks , 2009, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[9]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[10]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[11]  Saeed Salem,et al.  MFMS: maximal frequent module set mining from multiple human gene expression data sets , 2013, BioKDD '13.

[12]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[13]  Saeed Salem,et al.  Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets , 2014, BioData Mining.

[14]  A. Brazma,et al.  Gene expression data analysis. , 2001, FEBS letters.

[15]  Haifeng Li,et al.  Systematic discovery of functional modules and context-specific functional annotation of human genome , 2007, ISMB/ECCB.

[16]  Jian Pei,et al.  Mining frequent cross-graph quasi-cliques , 2009, TKDD.

[17]  Jian Pei,et al.  DHC: a density-based hierarchical clustering method for time series gene expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..