An integrative approach to modeling biological networks

Networks are used to model real-world phenomena in various domains, including systems biology. Since proteins carry out biological processes by interacting with other proteins, it is expected that cellular functions are reflected in the structure of protein-protein interaction (PPI) networks. Similarly, the topology of residue interaction graphs (RIGs) that model proteins' 3-dimensional structure might provide insights into protein folding, stability, and function. An important step towards understanding these networks is finding an adequate network model, since models can be exploited algorithmically as well as used for predicting missing data. Evaluating the fit of a model network to the data is a formidable challenge, since network comparisons are computationally infeasible and thus have to rely on heuristics, or "network properties." We show that it is difficult to assess the reliability of the fit of a model using any network property alone. Thus, we present an integrative approach that feeds a variety of network properties into five machine learning classifiers to predict the best-fitting network model for PPI networks and RIGs. We confirm that geometric random graphs (GEO) are the best-fitting model for RIGs. Since GEO networks model spatial relationships between objects and are thus expected to replicate well the underlying structure of spatially packed residues in a protein, the good fit of GEO to RIGs validates our approach. Additionally, we apply our approach to PPI networks and confirm that the structure of merged data sets containing both binary and co-complex data that are of high coverage and confidence is also consistent with the structure of GEO, while the structure of less complete and lower confidence data is not. Since PPI data are noisy, we test the robustness of the five classifiers to noise and show that their robustness levels differ. We demonstrate that none of the classifiers predicts noisy scale-free (SF) networks as GEO, whereas noisy GEOs can be classified as SF. Thus, it is unlikely that our approach would predict a real-world network as GEO if it had a noisy SF structure. However, it could classify the data as SF if it had a noisy GEO structure. Therefore, the structure of the PPI networks is the most consistent with the structure of a noisy GEO.

[1]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[2]  Igor Jurisica,et al.  Efficient estimation of graphlet frequency distributions in protein-protein interaction networks , 2006, Bioinform..

[3]  Aleksandar Stevanovic,et al.  Geometric Evolutionary Dynamics of Protein Interaction Networks , 2010, Pacific Symposium on Biocomputing.

[4]  M. Vidal,et al.  Literature-curated protein interaction datasets , 2009, Nature Methods.

[5]  Chris Wiggins,et al.  Discriminative topological features reveal biological network mechanisms , 2004, BMC Bioinformatics.

[6]  J. Doyle,et al.  Some protein interaction data do not exhibit power law statistics , 2005, FEBS letters.

[7]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[8]  Sarel J Fleishman,et al.  Comment on "Network Motifs: Simple Building Blocks of Complex Networks" and "Superfamilies of Evolved and Designed Networks" , 2004, Science.

[9]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[10]  Michael Lappe,et al.  Optimized Null Model for Protein Structure Networks , 2009, PloS one.

[11]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[12]  J. A. Stegemann,et al.  A Glossary of Basic Neural Network Terminology for Regression Problems , 1999, Neural Computing & Applications.

[13]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[14]  Desmond J. Higham,et al.  Fitting a geometric graph to a protein-protein interaction network , 2008, Bioinform..

[15]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[16]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Hyeong Jun An,et al.  Estimating the size of the human interactome , 2008, Proceedings of the National Academy of Sciences.

[19]  Premkumar T. Devanbu,et al.  Modeling and verifying a broad array of network properties , 2008, 0805.1489.

[20]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[21]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Martin Ester,et al.  Dense Graphlet Statistics of Protein Interaction and Random Networks , 2009, Pacific Symposium on Biocomputing.

[23]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25]  Tijana Milenkovic,et al.  GraphCrunch: A tool for large network analyses , 2008, BMC Bioinformatics.

[26]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[27]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[28]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[29]  Natasa Przulj,et al.  Learning the Structure of Protein-Protein Interaction Networks , 2009, Pacific Symposium on Biocomputing.

[30]  Natasa Przulj,et al.  Modelling protein–protein interaction networks via a stickiness index , 2006, Journal of The Royal Society Interface.

[31]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[32]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[33]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Yael Mandel-Gutfreund,et al.  Revealing unique properties of the ribosome using a network based analysis , 2008, Nucleic acids research.

[35]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[36]  Philip D. Wasserman,et al.  Advanced methods in neural computing , 1993, VNR computer library.

[37]  Carsten Wiuf,et al.  The effects of incomplete protein interaction data on structural and evolutionary inferences , 2006, BMC Biology.

[38]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[39]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[40]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[41]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[42]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[43]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[44]  Soumen Chakrabarti,et al.  Mining the web - discovering knowledge from hypertext data , 2002 .

[45]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[46]  S. Shen-Orr,et al.  Superfamilies of Evolved and Designed Networks , 2004, Science.

[47]  L. Holm,et al.  Unraveling protein interaction networks with near-optimal efficiency , 2004, Nature Biotechnology.

[48]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[49]  K. Gunsalus,et al.  Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome network , 2009, Nature Methods.

[50]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[51]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[52]  E. Ziv,et al.  Inferring network mechanisms: the Drosophila melanogaster protein interaction network. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[54]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[55]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..