Exploring Function Prediction in Protein Interaction Networks via Clustering Methods

Complex networks have recently become the focus of research in many fields. Their structure reveals crucial information for the nodes, how they connect and share information. In our work we analyze protein interaction networks as complex networks for their functional modular structure and later use that information in the functional annotation of proteins within the network. We propose several graph representations for the protein interaction network, each having different level of complexity and inclusion of the annotation information within the graph. We aim to explore what the benefits and the drawbacks of these proposed graphs are, when they are used in the function prediction process via clustering methods. For making this cluster based prediction, we adopt well established approaches for cluster detection in complex networks using most recent representative algorithms that have been proven as efficient in the task at hand. The experiments are performed using a purified and reliable Saccharomyces cerevisiae protein interaction network, which is then used to generate the different graph representations. Each of the graph representations is later analysed in combination with each of the clustering algorithms, which have been possibly modified and implemented to fit the specific graph. We evaluate results in regards of biological validity and function prediction performance. Our results indicate that the novel ways of presenting the complex graph improve the prediction process, although the computational complexity should be taken into account when deciding on a particular approach.

[1]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[2]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[3]  R. Lambiotte,et al.  Line graphs, link partitions, and overlapping communities. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[5]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[6]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[7]  Renaud Lambiotte,et al.  Multi-scale modularity in complex networks , 2010, 8th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks.

[8]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[9]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[10]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Andrzej Kloczkowski,et al.  Functional clustering of yeast proteins from the protein-protein interaction network , 2006, BMC Bioinformatics.

[12]  Hendrik Blockeel,et al.  On the importance of similarity measures for planning to learn , 2010, ECAI 2010.

[13]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[14]  Caroline C. Friedel,et al.  Inferring topology from clustering coefficients in protein-protein interaction networks , 2006, BMC Bioinformatics.

[15]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[17]  Bing Zhou,et al.  Comparing the biological coherence of network clusters identified by different detection algorithms , 2007 .

[18]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Claudia Schoch,et al.  Computer aided analysis of additional chromosome aberrations in Philadelphia chromosome positive acute lymphoblastic leukaemia using a simplified computer readable cytogenetic notation , 2003, BMC Bioinformatics.

[20]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[21]  Feng Luo,et al.  Modular organization of protein interaction networks , 2007, Bioinform..

[22]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[23]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Blatt,et al.  Superparamagnetic clustering of data. , 1998, Physical review letters.

[25]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Gary D. Bader,et al.  BIND-a data specification for storing and describing biomolecular interactions, molecular complexes and pathways , 2000, Bioinform..

[27]  A. Mukhopadhyay,et al.  Detecting protein complexes in a PPI network: a gene ontology based multi-objective evolutionary approach. , 2012, Molecular bioSystems.

[28]  B. Schwikowski,et al.  A network of protein–protein interactions in yeast , 2000, Nature Biotechnology.

[29]  Frank Dudbridge,et al.  The Use of Edge-Betweenness Clustering to Investigate Biological Function in Protein Interaction Networks , 2005, BMC Bioinformatics.

[30]  David L. Robertson,et al.  Specificity in protein interactions and its relationship with sequence diversity and coevolution , 2007, Proceedings of the National Academy of Sciences.

[31]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[32]  T.S.Evans,et al.  Line graphs of weighted networks for overlapping communities , 2009, 0912.4389.

[33]  James A. Casbon,et al.  On single and multiple models of protein families for the detection of remote sequence relationships , 2006, BMC Bioinformatics.

[34]  Andrea Lancichinetti,et al.  Erratum: Community detection algorithms: A comparative analysis [Phys. Rev. E 80, 056117 (2009)] , 2014 .

[35]  YuanBo,et al.  Detecting functional modules in the yeast protein--protein interaction network , 2006 .

[36]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[37]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Marco Punta,et al.  The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function , 2008, PLoS Comput. Biol..

[39]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[40]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[41]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[42]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[43]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[44]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[45]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[46]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[47]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[49]  Leon Danon,et al.  Comparing community structure identification , 2005, cond-mat/0505245.

[50]  M. Tyers,et al.  The GRID: The General Repository for Interaction Datasets , 2003, Genome Biology.

[51]  Feng Luo,et al.  Modular organization of protein interaction networks , 2007, Bioinform..

[52]  Hans-Werner Mewes,et al.  MPact: the MIPS protein interaction resource on yeast , 2005, Nucleic Acids Res..

[53]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[54]  E M Glass,et al.  Knowledge‐based voting algorithm for automated protein functional annotation † , 2005, Proteins.

[55]  R. Lambiotte,et al.  Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks , 2008, IEEE Transactions on Network Science and Engineering.

[56]  Bo Xu,et al.  Protein Complex Prediction in Large Ontology Attributed Protein-Protein Interaction Networks , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[57]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[59]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[60]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[61]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[62]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[63]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[64]  S. Fortunato,et al.  Resolution limit in community detection , 2006, Proceedings of the National Academy of Sciences.