CHAPTER 1 CLUSTERING METHODS IN PROTEIN-PROTEIN INTERACTION NETWORK

With completion of a draft sequence of the human genome, the field of gen etics stands on the threshold of significant theoretical and practical advances. Crucial to furthering these investigations is a comprehensive understanding of the expression, function, a nd regulation of the proteins encoded by an organism. It has been observed that proteins seldom a ct as single isolated species in the performance of their functions; rather, proteins involved in the sam e cellular processes often interact with each other. Therefore, the functions of uncharacterized proteins can be predicted through comparison with the interactions of similar known proteins. A detailed examination of the protein-protein interaction (PPI) network can thus yield significant new un derstanding of protein function. Clustering is the process of grouping data objects into sets (cluste rs) which demonstrate greater similarity among objects in the same cluster than in different clusters . Clu tering in the PPI network context groups together proteins which share a larger number of interactions. The results of this process can illuminate the structure of the PPI network and suggest possible functions for members of the cluster which were previously uncharacterized. This chapter will begin with a brief introduction of the properties of protein-p rotein interaction networks, including a review of the data which has been generated by both experimental and computational approaches. A variety of methods which have been employed t cluster these networks will then be presented. These approaches are broadly characterized as either distance-based or

[1]  A. Wagner How the global structure of protein interaction networks evolves , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[2]  Yoshihiro Yamaguchi,et al.  Roles for the Two-hybrid System in Exploration of the Yeast Protein Interactome* , 2002, Molecular & Cellular Proteomics.

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Mitsuaki Yanagida,et al.  Functional proteomics; current achievements. , 2002, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[5]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[6]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[7]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[9]  Hanno Steen,et al.  Analysis of protein phosphorylation using mass spectrometry: deciphering the phosphoproteome. , 2002, Trends in biotechnology.

[10]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[11]  Aidong Zhang,et al.  A two-step approach for clustering proteins based on protein interaction profile , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[12]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[13]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[14]  Mariano Sigman,et al.  Global organization of the Wordnet lexicon , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  B. Drees,et al.  Progress and variations in two-hybrid and three-hybrid technologies. , 1999, Current opinion in chemical biology.

[16]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[17]  Blatt,et al.  Superparamagnetic clustering of data. , 1998, Physical review letters.

[18]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[19]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[20]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Christian von Mering,et al.  A comprehensive set of protein complexes in yeast: mining large scale protein-protein interaction screens , 2003, Bioinform..

[22]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[23]  D. Fell,et al.  The small world of metabolism , 2000, Nature Biotechnology.

[24]  Aidong Zhang,et al.  A topological measurement for weighted protein interaction network , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[25]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[26]  M. Snyder,et al.  Proteomics: Protein complexes take the bait , 2002, Nature.

[27]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[28]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[29]  S. Schreiber,et al.  Printing proteins as microarrays for high-throughput function determination. , 2000, Science.

[30]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[31]  Marc S. Lewis,et al.  Modern analytical ultracentrifugation in protein science: A tutorial review , 2002, Protein science : a publication of the Protein Society.

[32]  B. Rost,et al.  Analysing six types of protein-protein interfaces. , 2003, Journal of molecular biology.

[33]  S. Dongen A new cluster algorithm for graphs , 1998 .

[34]  Galina V. Glazko,et al.  The choice of optimal distance measure in genome-wide datasets , 2005, Bioinform..

[35]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[36]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[37]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[38]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[39]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[40]  A. Varshavsky,et al.  Split ubiquitin as a sensor of protein interactions in vivo. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Eytan Domany,et al.  Automated assignment of SCOP and CATH protein structure classifications from FSSP scores , 2002, Proteins.

[42]  Neil Hall,et al.  Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry , 2002, Nature.

[43]  P. Mortensen,et al.  Mass spectrometry allows direct identification of proteins in large genomes , 2001, Proteomics.

[44]  P. Lewi,et al.  Protein–protein interactions: mechanisms and modification by drugs , 2002, Journal of molecular recognition : JMR.

[45]  B. Snel,et al.  SHOT: a web server for the construction of genome phylogenies. , 2002, Trends in genetics : TIG.

[46]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[47]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[48]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[49]  Shoudan Liang,et al.  Redundancies in Large-scale Protein Interaction Networks , 2003 .

[50]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[51]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[52]  E. Koonin,et al.  The structure of the protein universe and genome evolution , 2002, Nature.

[53]  S. Jones,et al.  Principles of protein-protein interactions. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Eugene V. Koonin,et al.  A top-down method for building genome classification trees with linear binary hierarchies , 2001, Bioconsensus.

[55]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[56]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[57]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[58]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[59]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[60]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[61]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[62]  M. Wickens,et al.  Yeast three-hybrid system to detect and analyze interactions between RNA and protein. , 1999, Methods in enzymology.

[63]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[64]  U. Alon Biological Networks: The Tinkerer as an Engineer , 2003, Science.

[65]  D. Bu,et al.  Topological structure analysis of the protein-protein interaction network in budding yeast. , 2003, Nucleic acids research.

[66]  S. Havlin,et al.  Scale-free networks are ultrasmall. , 2002, Physical review letters.

[67]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[68]  S. Dongen Performance criteria for graph clustering and Markov cluster experiments , 2000 .

[69]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[70]  Paolo Mariani,et al.  Interaction of proteins in solution from small-angle scattering: a perturbative approach. , 2002, Biophysical journal.

[71]  Haijun Zhou Distance, dissimilarity index, and network community structure. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[72]  Haijun Zhou Network landscape from a Brownian particle's perspective. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[73]  J. Thornton,et al.  Diversity of protein–protein interactions , 2003, The EMBO journal.

[74]  Kenji Satou,et al.  Extraction of knowledge on protein-protein interaction by association rule discovery , 2002, Bioinform..

[75]  M. Mann,et al.  Proteomic analysis of post-translational modifications , 2003, Nature Biotechnology.

[76]  Daniel Auerbach,et al.  The post‐genomic era of interactive proteomics: Facts and perspectives , 2002, Proteomics.

[77]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[78]  Thomas P Conrads,et al.  New tools for quantitative phosphoproteome analysis. , 2002, Biochemical and biophysical research communications.

[79]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[80]  A. Fersht,et al.  Analysis of protein-protein interactions by mutagenesis: direct versus indirect effects. , 1999, Protein engineering.

[81]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[82]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[83]  Ricard V. Solé,et al.  A Model of Large-Scale proteome Evolution , 2002, Adv. Complex Syst..

[84]  R. Aebersold,et al.  Proteomics: the first decade and beyond , 2003, Nature Genetics.

[85]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.