Clustering Methods in a Protein–Protein Interaction Network

With completion of a draft sequence of the human genome, the field of gen etics stands on the threshold of significant theoretical and practical advances. Crucial to furthering these investigations is a comprehensive understanding of the expression, function, and regulation of the proteins encoded by an organism. It has been observed that proteins seldom act as single isolated species in the performance of their functions; rather, proteins involved in the sam e cellular processes often interact with each other. Therefore, the functions of uncharacterized proteins can be predicted through comparison with the interactions of similar known proteins. A detailed examination of the protein-protein interaction (PPI) network can thus yield significant new un derstanding of protein function. Clustering is the process of grouping data objects into sets (cluste rs) which demonstrate greater similarity among objects in the same cluster than in different clusters . Clustering in the PPI network context groups together proteins which share a larger number of interactions. The results of this process can illuminate the structure of the PPI network and suggest possible functions for members of the cluster which were previously uncharacterized. This chapter will begin with a brief introduction of the properties of protein-p rotein interaction networks, including a review of the data which has been generated by both experimental and computational approaches. A variety of methods which have been employed to cluster these networks will then be presented. These approaches are broadly characterized as either distance-based or

[1]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[2]  J. Thornton,et al.  Diversity of protein–protein interactions , 2003, The EMBO journal.

[3]  S. Havlin,et al.  Scale-free networks are ultrasmall. , 2002, Physical review letters.

[4]  Aidong Zhang,et al.  A topological measurement for weighted protein interaction network , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[5]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[6]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[8]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[9]  Stanley Fields,et al.  The Two-Hybrid System , 2001 .

[10]  Mitsuaki Yanagida,et al.  Functional proteomics; current achievements. , 2002, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[11]  S. Dongen Performance criteria for graph clustering and Markov cluster experiments , 2000 .

[12]  Neil Hall,et al.  Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry , 2002, Nature.

[13]  A. Varshavsky,et al.  Split ubiquitin as a sensor of protein interactions in vivo. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[14]  A. Fersht,et al.  Analysis of protein-protein interactions by mutagenesis: direct versus indirect effects. , 1999, Protein engineering.

[15]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[16]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[17]  S. Dongen A new cluster algorithm for graphs , 1998 .

[18]  Hanno Steen,et al.  Analysis of protein phosphorylation using mass spectrometry: deciphering the phosphoproteome. , 2002, Trends in biotechnology.

[19]  Eugene V. Koonin,et al.  A top-down method for building genome classification trees with linear binary hierarchies , 2001, Bioconsensus.

[20]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[21]  Haijun Zhou Network landscape from a Brownian particle's perspective. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Galina V. Glazko,et al.  The choice of optimal distance measure in genome-wide datasets , 2005, Bioinform..

[23]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[25]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Eytan Domany,et al.  Superparamagnetic Clustering of Data , 1996 .

[27]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[28]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[29]  Haijun Zhou Distance, dissimilarity index, and network community structure. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[31]  Eytan Domany,et al.  Automated assignment of SCOP and CATH protein structure classifications from FSSP scores , 2002, Proteins.

[32]  U. Alon Biological Networks: The Tinkerer as an Engineer , 2003, Science.

[33]  B. Drees,et al.  Progress and variations in two-hybrid and three-hybrid technologies. , 1999, Current opinion in chemical biology.

[34]  Yoshihiro Yamaguchi,et al.  Roles for the Two-hybrid System in Exploration of the Yeast Protein Interactome* , 2002, Molecular & Cellular Proteomics.

[35]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[36]  M. Wickens,et al.  Yeast three-hybrid system to detect and analyze interactions between RNA and protein. , 1999, Methods in enzymology.

[37]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[38]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[39]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[40]  A. Wagner How the global structure of protein interaction networks evolves , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[41]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[42]  M. Mann,et al.  Proteomic analysis of post-translational modifications , 2003, Nature Biotechnology.

[43]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[44]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Mariano Sigman,et al.  Global organization of the Wordnet lexicon , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[47]  P. Mortensen,et al.  Mass spectrometry allows direct identification of proteins in large genomes , 2001, Proteomics.

[48]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[49]  M. Snyder,et al.  Proteomics: Protein complexes take the bait , 2002, Nature.

[50]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[51]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[52]  P. Lewi,et al.  Protein–protein interactions: mechanisms and modification by drugs , 2002, Journal of molecular recognition : JMR.

[53]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[54]  S. Schreiber,et al.  Printing proteins as microarrays for high-throughput function determination. , 2000, Science.

[55]  R. Aebersold,et al.  Proteomics: the first decade and beyond , 2003, Nature Genetics.

[56]  B. Snel,et al.  SHOT: a web server for the construction of genome phylogenies. , 2002, Trends in genetics : TIG.

[57]  Paolo Mariani,et al.  Interaction of proteins in solution from small-angle scattering: a perturbative approach. , 2002, Biophysical journal.

[58]  Christian von Mering,et al.  A comprehensive set of protein complexes in yeast: mining large scale protein-protein interaction screens , 2003, Bioinform..

[59]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[60]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[61]  Marc S. Lewis,et al.  Modern analytical ultracentrifugation in protein science: A tutorial review , 2002, Protein science : a publication of the Protein Society.

[62]  Aidong Zhang,et al.  A two-step approach for clustering proteins based on protein interaction profile , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[63]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[64]  Kenji Satou,et al.  Extraction of knowledge on protein-protein interaction by association rule discovery , 2002, Bioinform..

[65]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[66]  Thomas P Conrads,et al.  New tools for quantitative phosphoproteome analysis. , 2002, Biochemical and biophysical research communications.

[67]  Ricard V. Solé,et al.  A Model of Large-Scale proteome Evolution , 2002, Adv. Complex Syst..

[68]  Shoudan Liang,et al.  Redundancies in Large-scale Protein Interaction Networks , 2003 .

[69]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[70]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[71]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[72]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[73]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[74]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[75]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[76]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[78]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[79]  D. Fell,et al.  The small world of metabolism , 2000, Nature Biotechnology.

[80]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[81]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[82]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.