Hybrid coexpression link similarity graph clustering for mining biological modules from multiple gene expression datasets

BackgroundAdvances in genomic technologies have enabled the accumulation of vast amount of genomic data, including gene expression data for multiple species under various biological and environmental conditions. Integration of these gene expression datasets is a promising strategy to alleviate the challenges of protein functional annotation and biological module discovery based on a single gene expression data, which suffers from spurious coexpression.ResultsWe propose a joint mining algorithm that constructs a weighted hybrid similarity graph whose nodes are the coexpression links. The weight of an edge between two coexpression links in this hybrid graph is a linear combination of the topological similarities and co-appearance similarities of the corresponding two coexpression links. Clustering the weighted hybrid similarity graph yields recurrent coexpression link clusters (modules). Experimental results on Human gene expression datasets show that the reported modules are functionally homogeneous as evident by their enrichment with biological process GO terms and KEGG pathways.

[1]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[2]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[3]  Chid Apte,et al.  Proceedings of the 2009 SIAM International Conference on Data Mining , 2009 .

[4]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[5]  Fei Wang,et al.  Integrated KL (K-means - Laplacian) Clustering: A New Clustering Approach by Combining Attribute Data and Pairwise Relations , 2009, SDM.

[6]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[7]  Jiawei Han,et al.  Mining closed relational graphs with connectivity constraints , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  George Karypis,et al.  Proceedings of the 12th International Workshop on Data Mining in Bioinformatics , 2011, KDD 2013.

[9]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[10]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[11]  Tamer Kahveci,et al.  A scalable method for discovering significant subnetworks , 2013, BMC Systems Biology.

[12]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[13]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[14]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[15]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[16]  Haifeng Li,et al.  Systematic discovery of functional modules and context-specific functional annotation of human genome , 2007, ISMB/ECCB.

[17]  Saeed Salem,et al.  MFMS: maximal frequent module set mining from multiple human gene expression data sets , 2013, BioKDD '13.

[18]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[19]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[20]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[21]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[22]  David D. Jensen,et al.  Spectral Clustering with Links and Attributes , 2004 .

[23]  LiHaifeng,et al.  Systematic discovery of functional modules and context-specific functional annotation of human genome , 2007 .

[24]  Haifeng Li,et al.  Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation , 2011, PLoS Comput. Biol..

[25]  Jian Pei,et al.  Mining frequent cross-graph quasi-cliques , 2009, TKDD.

[26]  Olaf Wolkenhauer,et al.  Simulations of stressosome activation emphasize allosteric interactions between RsbR and RsbT , 2013, BMC Systems Biology.

[27]  Steve Horvath,et al.  Network neighborhood analysis with the multi-node topological overlap measure , 2007, Bioinform..