Modular Biological Function Is Most Effectively Captured by Combining Molecular Interaction Data Types

Large-scale molecular interaction data sets have the potential to provide a comprehensive, system-wide understanding of biological function. Although individual molecules can be promiscuous in terms of their contribution to function, molecular functions emerge from the specific interactions of molecules giving rise to modular organisation. As functions often derive from a range of mechanisms, we demonstrate that they are best studied using networks derived from different sources. Implementing a graph partitioning algorithm we identify subnetworks in yeast protein-protein interaction (PPI), genetic interaction and gene co-regulation networks. Among these subnetworks we identify cohesive subgraphs that we expect to represent functional modules in the different data types. We demonstrate significant overlap between the subgraphs generated from the different data types and show these overlaps can represent related functions as represented by the Gene Ontology (GO). Next, we investigate the correspondence between our subgraphs and the Gene Ontology. This revealed varying degrees of coverage of the biological process, molecular function and cellular component ontologies, dependent on the data type. For example, subgraphs from the PPI show enrichment for 84%, 58% and 93% of annotated GO terms, respectively. Integrating the interaction data into a combined network increases the coverage of GO. Furthermore, the different annotation types of GO are not predominantly associated with one of the interaction data types. Collectively our results demonstrate that successful capture of functional relationships by network data depends on both the specific biological function being characterised and the type of network data being used. We identify functions that require integrated information to be accurately represented, demonstrating the limitations of individual data types. Combining interaction subnetworks across data types is therefore essential for fully understanding the complex and emergent nature of biological function.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[3]  Hamid Bolouri,et al.  A data integration methodology for systems biology. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Gary D. Bader,et al.  Protein Complexes are Central in the Yeast Genetic Landscape , 2011, PLoS Comput. Biol..

[5]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[6]  Gary D Bader,et al.  The Genetic Landscape of a Cell , 2010, Science.

[7]  J. M. Sherman,et al.  The SIR2 gene family, conserved from bacteria to humans, functions in silencing, cell cycle progression, and chromosome stability. , 1995, Genes & development.

[8]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[9]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[10]  Safaai Deris,et al.  Combining Clustering and Bayesian Network for Gene Network Inference , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[11]  T. Ideker,et al.  Assembling global maps of cellular function through integrative analysis of physical and genetic networks , 2011, Nature Protocols.

[12]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[13]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[14]  T. Ideker,et al.  Systematic interpretation of genetic interactions using protein networks , 2005, Nature Biotechnology.

[15]  Fidel Ramírez,et al.  Computing topological parameters of biological networks , 2008, Bioinform..

[16]  E. Olusegun George,et al.  An improved data integration methodology for system biology , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[17]  T. Ideker,et al.  A gene ontology inferred from molecular networks , 2012, Nature Biotechnology.

[18]  Gary D Bader,et al.  Global Mapping of the Yeast Genetic Interaction Network , 2004, Science.

[19]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[20]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[21]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[22]  S. L. Wong,et al.  Combining biological networks to predict genetic interactions. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Youping Deng,et al.  Recent advances in clustering methods for protein interaction networks , 2010, BMC Genomics.

[24]  Christopher C. Moser,et al.  Natural engineering principles of electron tunnelling in biological oxidation–reduction , 1999, Nature.

[25]  Quaid Morris,et al.  Combining many interaction networks to predict gene function and analyze gene lists , 2012, Proteomics.

[26]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[27]  Wei Pan,et al.  Gene Function Prediction by a Combined Analysis of Gene Expression Data and Protein-protein Interaction Data , 2005, J. Bioinform. Comput. Biol..

[28]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[29]  G. Church,et al.  Modular epistasis in yeast metabolism , 2005, Nature Genetics.