Detecting overlapping protein complexes in protein-protein interaction networks

We introduce clustering with overlapping neighborhood expansion (ClusterONE), a method for detecting potentially overlapping protein complexes from protein-protein interaction data. ClusterONE-derived complexes for several yeast data sets showed better correspondence with reference complexes in the Munich Information Center for Protein Sequence (MIPS) catalog and complexes derived from the Saccharomyces Genome Database (SGD) than the results of seven popular methods. The results also showed a high extent of functional homogeneity.

[1]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[2]  T. Motzkin,et al.  Maxima for Graphs and a New Proof of a Theorem of Turán , 1965, Canadian Journal of Mathematics.

[3]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[4]  J. M. Smith,et al.  The Logic of Animal Conflict , 1973, Nature.

[5]  J. M. Smith The theory of games and the evolution of animal conflicts. , 1974, Journal of theoretical biology.

[6]  J M Smith,et al.  Evolution and the theory of games , 1976 .

[7]  Vijay V. Raghavan,et al.  A Comparison of the Stability Characteristics of Some Graph Theoretic Clustering Methods , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  E. Akin,et al.  Dynamics of games and genes: Discrete versus continuous time , 1983 .

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[11]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[12]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[16]  K. Obermayer,et al.  Statistical Physics of Clustering Algorithms , 1998 .

[17]  S. Dongen Graph clustering by flow simulation , 2000 .

[18]  Eric van Damme,et al.  Non-Cooperative Games , 2000 .

[19]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[20]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[21]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[22]  J. Yates,et al.  Implication of a novel multiprotein Dam1p complex in outer kinetochore function , 2001, The Journal of cell biology.

[23]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Michael Werman,et al.  Self-Organization in Vision: Stochastic Clustering for Image Segmentation, Perceptual Grouping, and Image Database Organization , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[26]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[27]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[28]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[29]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[30]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[31]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[32]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[33]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[34]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  M. Gerstein,et al.  Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. , 2004, Current opinion in microbiology.

[36]  M. Gerstein,et al.  Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. , 2004, Genome research.

[37]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2005, Nucleic Acids Res..

[38]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[39]  Malik Magdon-Ismail,et al.  Efficient Identification of Overlapping Communities , 2005, ISI.

[40]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[41]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[42]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[43]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[44]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[45]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[46]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[47]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[48]  BMC Bioinformatics , 2005 .

[49]  David L. Robertson,et al.  Protein Interactions from Complexes: A Structural Perspective , 2006, Comparative and functional genomics.

[50]  M. Pelillo,et al.  Dominant Sets and Pairwise Clustering , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Sara Linse,et al.  Methods for the detection and analysis of protein–protein interactions , 2007, Proteomics.

[52]  T. Vicsek,et al.  Weighted network modules , 2007, cond-mat/0703706.

[53]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[54]  Nagiza F. Samatova,et al.  From pull-down data to protein interaction networks and complexes with biological relevance. , 2008, Bioinformatics.

[55]  Peer Bork,et al.  Not Comparable, But Complementary , 2008, Science.

[56]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[57]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, RECOMB.

[58]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[59]  Andrea Torsello,et al.  Beyond partitions: Allowing overlapping groups in pairwise clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[60]  Caroline C. Friedel,et al.  ProCope - protein complex prediction and evaluation , 2008, Bioinform..

[61]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[62]  Lin Gao,et al.  Fast algorithms for detecting overlapping functional modules in protein-protein interaction networks , 2009, 2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[63]  Igor Jurisica,et al.  Inferring the functions of longevity genes with modular subnetwork biomarkers of Caenorhabditis elegans aging , 2010, Genome Biology.

[64]  Nathan Blow,et al.  Systems biology: Untangling the protein web , 2009, Nature.

[65]  Uwe Schlattner,et al.  Yeast Two-Hybrid, a Powerful Tool for Systems Biology , 2009, International journal of molecular sciences.

[66]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[67]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[68]  Randy Goebel,et al.  Detecting Communities in Large Networks by Iterative Local Expansion , 2009, 2009 International Conference on Computational Aspects of Social Networks.

[69]  Samuel,et al.  A Game-Theoretic Framework for Similarity-Based Data Clustering , 2009 .

[70]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[71]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[72]  Peter B. McGarvey,et al.  Infrastructure for the life sciences: design and implementation of the UniProt website , 2009, BMC Bioinformatics.

[73]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[74]  Javier De Las Rivas,et al.  Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks , 2010, PLoS Comput. Biol..

[75]  Anne-Laure Boulesteix,et al.  Over-optimism in bioinformatics research , 2010, Bioinform..

[76]  Marcello Pelillo,et al.  Graph-based quadratic optimization: A fast evolutionary approach , 2011, Comput. Vis. Image Underst..

[77]  Yi Pan,et al.  A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[78]  Samuel Rota Bulò,et al.  Infection and immunization: A new class of evolutionary game dynamics , 2011, Games Econ. Behav..

[79]  John H. Morris,et al.  Improving the quality of protein similarity network clustering algorithms using the network edge weight distribution , 2011, Bioinform..

[80]  Andrei L. Turinsky,et al.  A Census of Human Soluble Protein Complexes , 2012, Cell.