Discovering weighted motifs in gene co-expression networks

A important dimension of complex networks is embedded in the weights of its edges. Incorporating this source of information on the analysis of a network can greatly enhance our understanding of it. This is the case for gene co-expression networks, which encapsulate information about the strength of correlation between gene expression profiles. Classical unweighted gene co-expression networks use thresholding for defining connectivity, losing some of the information contained in the different connection strengths. In this paper, we propose a mining method capable of extracting information from weighted gene co-expression networks. We study groups of differently connected nodes and their importance as network motifs. We define a subgraph as a motif if the weights of edges inside the subgraph hold a significantly different distribution than what would be found in a random distribution. We use the Kolmogorov-Smirnov test to calculate the significance score of the subgraph, avoiding the time consuming generation of random networks to determine statistic significance. We apply our approach to gene co-expression networks related to three different types of cancer and also to two healthy datasets. The structure of the networks is compared using weighted motif profiles, and our results show that we are able to clearly distinguish the networks and separate them by type. We also compare the biological relevance of our weighted approach to a more classical binary motif profile, where edges are unweighted. We use shared Gene Ontology annotations on biological processes, cellular components and molecular functions. The results of gene enrichment analysis show that weighted motifs are biologically more significant than the binary motifs.

[1]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Jun Dong,et al.  Geometric Interpretation of Gene Coexpression Network Analysis , 2008, PLoS Comput. Biol..

[3]  E. Davidson,et al.  The hardwiring of development: organization and function of genomic regulatory systems. , 1997, Development.

[4]  M. Zhan,et al.  Exploring pathways from gene co-expression to network dynamics. , 2009, Methods in molecular biology.

[5]  S. Horvath,et al.  Conservation and evolution of gene coexpression networks in human and chimpanzee brains , 2006, Proceedings of the National Academy of Sciences.

[6]  Frans Coenen,et al.  Frequent Sub-graph Mining on Edge Weighted Graphs , 2010, DaWak.

[7]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[8]  K. Gunsalus,et al.  Network modeling links breast cancer susceptibility and centrosome dysfunction. , 2007, Nature genetics.

[9]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[14]  Fernando M. A. Silva,et al.  G-Tries: a data structure for storing and finding subgraphs , 2014, Data Mining and Knowledge Discovery.

[15]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[16]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[17]  Y. Leea,et al.  Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target , 2006 .

[18]  Fernando M. A. Silva,et al.  Comparison of Co-authorship Networks across Scientific Fields Using Motifs , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[19]  G. Rubin,et al.  The Role of the Genome Project in Determining Gene Function: Insights from Model Organisms , 1996, Cell.

[20]  Yang Xiang,et al.  Using Frequent Co-expression Network to Identify Gene Clusters for Breast Cancer Prognosis , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[21]  Fernando M. A. Silva,et al.  g-tries: an efficient data structure for discovering network motifs , 2010, SAC '10.

[22]  Wei Zhao,et al.  Weighted Gene Coexpression Network Analysis: State of the Art , 2010, Journal of biopharmaceutical statistics.

[23]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[24]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[25]  S. Horvath,et al.  Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks , 2006, BMC Genomics.

[26]  Edward R B McCabe,et al.  Weighted gene co-expression network analysis identifies biomarkers in glycerol kinase deficient mice. , 2009, Molecular genetics and metabolism.

[27]  Luc De Raedt,et al.  The molecular feature miner MolFea , 2003 .

[28]  S. Shen-Orr,et al.  Superfamilies of Evolved and Designed Networks , 2004, Science.

[29]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[30]  Yang Xiang,et al.  Weighted Frequent Gene Co-expression Network Mining to Identify Genes Involved in Genome Stability , 2012, PLoS Comput. Biol..

[31]  Aiqing He,et al.  Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids , 2006, Proceedings of the National Academy of Sciences.

[32]  Jun Dong,et al.  Understanding network concepts in modules , 2007, BMC Systems Biology.

[33]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[34]  Fernando M. A. Silva,et al.  Motif Mining in Weighted Networks , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[35]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[36]  Jari Saramäki,et al.  Characterizing Motifs in Weighted Complex Networks , 2005 .

[37]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[38]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.