An automated method for finding molecular complexes in large protein interaction networks

BackgroundRecent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery.ResultsThis paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation.ConclusionDense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.

[1]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[2]  Gary D Bader,et al.  A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules , 2001, Science.

[3]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[4]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[5]  Natalia Maltsev,et al.  WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction , 2000, Nucleic Acids Res..

[6]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[7]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[8]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[9]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[10]  T. Dobzhansky Nothing in Biology Makes Sense Except in the Light of Evolution , 1973 .

[11]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[12]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[13]  Vladimir Batagelj,et al.  Pajek - Program for Large Network Analysis , 1999 .

[14]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[15]  Thomas Kodadek,et al.  Recruitment of a 19S Proteasome Subcomplex to an Activated Promoter , 2002, Science.

[16]  D. Fell,et al.  The small world of metabolism , 2000, Nature Biotechnology.

[17]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[18]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[19]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[20]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D. Fell,et al.  The small world inside large metabolic networks , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22]  Kara Dolinski,et al.  Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure , 1999, Nucleic Acids Res..

[23]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[24]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[25]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[26]  Ian Dix,et al.  Yeast Yeast 2000; 17: 95±110. Research Article , 2000 .

[27]  J. E. Kranz,et al.  YPD, PombePD and WormPD: model organism volumes of the BioKnowledge library, an integrated resource for protein information. , 2001, Nucleic acids research.

[28]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[29]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[30]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[31]  S. Fields Proteomics in Genomeland , 2001, Science.

[32]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[33]  Mark Gerstein,et al.  Structural proteomics of an archaeon , 2000, Nature Structural Biology.

[34]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[35]  S. Fields,et al.  A protein interaction map for cell polarity development , 2001, The Journal of cell biology.

[36]  T. Pollard,et al.  Crystal Structure of Arp2/3 Complex , 2001, Science.

[37]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[38]  Gary William Flake,et al.  Self-organization of the web and identification of communities , 2002 .

[39]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[40]  Joshua M. Stuart,et al.  A Gene Expression Map for Caenorhabditis elegans , 2001, Science.

[41]  J D Beggs,et al.  Characterization of Sm‐like proteins in yeast and their association with U6 snRNA , 1999, The EMBO journal.

[42]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[43]  Tsuguchika Kaminuma,et al.  A Database for Cell Signaling Networks , 1998, J. Comput. Biol..

[44]  Peter D. Karp,et al.  The EcoCyc and MetaCyc databases , 2000, Nucleic Acids Res..

[45]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[46]  Marek S. Skrzypek,et al.  YPDTM, PombePDTM and WormPDTM: model organism volumes of the BioKnowledgeTM Library, an integrated resource for protein information , 2001, Nucleic Acids Res..

[47]  W. Dalton,et al.  The proteasome. , 2004, Seminars in oncology.

[48]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..