Protein-to-protein interactions: Technologies, databases, and algorithms

Studying proteins and their structures has an important role for understanding protein functionalities. Recently, due to important results obtained with proteomics, a great interest has been given to interactomics, that is, the study of protein-to-protein interactions, called PPI, or more generally, interactions among macromolecules, particularly within cells. Interactomics means studying, modeling, storing, and retrieving protein-to-protein interactions as well as algorithms for manipulating, simulating, and predicting interactions. PPI data can be obtained from biological experiments studying interactions. Modeling and storing PPIs can be realized by using graph theory and graph data management, thus graph databases can be queried for further experiments. PPI graphs can be used as input for data-mining algorithms, where raw data are binary interactions forming interaction graphs, and analysis algorithms retrieve biological interactions among proteins (i.e., PPI biological meanings). For instance, predicting the interactions between two or more proteins can be obtained by mining interaction networks stored in databases. In this article we survey modeling, storing, analyzing, and manipulating PPI data. After describing the main PPI models, mostly based on graphs, the article reviews PPI data representation and storage, as well as PPI databases. Algorithms and software tools for analyzing and managing PPI networks are discussed in depth. The article concludes by discussing the main challenges and research directions in PPI networks.

[1]  Igor Jurisica,et al.  Online Predicted Human Interaction Database , 2005, Bioinform..

[2]  Michele Tinti,et al.  VirusMINT: a viral protein interaction database , 2008, Nucleic Acids Res..

[3]  Matthew Suderman,et al.  Tools for visually exploring biological networks , 2007, Bioinform..

[4]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[5]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[6]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[7]  David R. Gilbert,et al.  Computational methodologies for modelling, analysis and simulation of signalling networks , 2006, Briefings Bioinform..

[8]  Giuseppe F. Italiano,et al.  Dynamic data structures for graphs , 1992 .

[9]  A. Lesne Complex Networks: from Graph Theory to Biology , 2006 .

[10]  Antal F. Novak,et al.  networks Græmlin : General and robust alignment of multiple large interaction data , 2006 .

[11]  Paolo Marcatili,et al.  The MoVIN server for the analysis of protein interaction networks , 2008, BMC Bioinformatics.

[12]  Jaques Reifman,et al.  Evidence of probabilistic behaviour in protein interaction networks , 2008, BMC Systems Biology.

[13]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[14]  Gary D. Bader,et al.  cPath: open source software for collecting, storing, and querying biological pathways , 2006, BMC Bioinformatics.

[15]  Petter Holme,et al.  Subnetwork hierarchies of biochemical pathways , 2002, Bioinform..

[16]  Christian von Mering,et al.  STRING: known and predicted protein–protein associations, integrated and transferred across organisms , 2004, Nucleic Acids Res..

[17]  Henning Hermjakob,et al.  Submit Your Interaction Data the IMEx Way , 2007, Proteomics.

[18]  Allan Kuchinsky,et al.  Exploring Biological Networks with Cytoscape Software , 2008, Current protocols in bioinformatics.

[19]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[20]  Baldomero Oliva,et al.  PIANA: protein interactions and network analysis , 2006, Bioinform..

[21]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[22]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[23]  Roded Sharan,et al.  NetworkBLAST: comparative analysis of protein networks , 2008 .

[24]  Teresa M. Przytycka,et al.  Decomposition of overlapping protein complexes: A graph theoretical method for analyzing static and dynamic protein associations , 2005, Algorithms for Molecular Biology.

[25]  Dong Dong,et al.  IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model , 2006, BMC Bioinformatics.

[26]  Robert Stevens,et al.  e-Science and biological pathway semantics , 2007, BMC Bioinformatics.

[27]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[28]  Gultekin Özsoyoglu,et al.  Pathways Database System: An Integrated System for Biological Pathways , 2003, Bioinform..

[29]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[30]  M. Golumbic Algorithmic graph theory and perfect graphs , 1980 .

[31]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[32]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[33]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[34]  R. Albert Scale-free networks in cell biology , 2005, Journal of Cell Science.

[35]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[36]  Hao Xiong,et al.  Network-based regulatory pathways analysis , 2004, Bioinform..

[37]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[38]  R. Milo,et al.  Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  R. Davis,et al.  Structural organization of MAP-kinase signaling modules by scaffold proteins in yeast and mammals. , 1998, Trends in biochemical sciences.

[40]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[41]  Dmitrij Frishman,et al.  Conservation of protein-protein interactions - lessons from ascomycota. , 2004, Trends in genetics : TIG.

[42]  Jaques Reifman,et al.  Probing the Extent of Randomness in Protein Interaction Networks , 2008, PLoS Comput. Biol..

[43]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[44]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[45]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[46]  R. Möhring Algorithmic graph theory and perfect graphs , 1986 .

[47]  Cheng-Yan Kao,et al.  POINT: a database for the prediction of protein-protein interactions based on the orthologous interactome , 2004, Bioinform..

[48]  Wojciech Szpankowski,et al.  Pairwise Local Alignment of Protein Interaction Networks Guided by Models of Evolution , 2005, RECOMB.

[49]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[50]  Baldomero Oliva,et al.  Structure-based evaluation of in silico predictions of protein-protein interactions using Comparative Docking , 2007, Bioinform..

[51]  Ian M. Donaldson,et al.  The Biomolecular Interaction Network Database and related tools 2005 update , 2004, Nucleic Acids Res..

[52]  Eric J. Deeds,et al.  A simple physical model for scaling in protein-protein interaction networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Pall I. Olason,et al.  A human phenome-interactome network of protein complexes implicated in genetic disorders , 2007, Nature Biotechnology.

[54]  M. Kastan,et al.  DNA damage activates ATM through intermolecular autophosphorylation and dimer dissociation , 2003, Nature.

[55]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[56]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[57]  Michael Lässig,et al.  Local graph alignment and motif search in biological networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Wei Chu,et al.  Identifying Protein Complexes in High-Throughput Protein Interaction Screens Using an Infinite Latent Feature Model , 2005, Pacific Symposium on Biocomputing.

[59]  Wojciech Szpankowski,et al.  Pairwise Alignment of Protein Interaction Networks , 2006, J. Comput. Biol..

[60]  Andrew D. King Graph clustering with restricted neighbourhood search , 2004 .

[61]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[62]  D. Fell,et al.  The small world of metabolism , 2000, Nature Biotechnology.

[63]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[64]  Erich E. Wanker,et al.  Comparison of Human Protein-Protein Interaction Maps , 2007, German Conference on Bioinformatics.

[65]  S. Dongen Graph clustering by flow simulation , 2000 .

[66]  A. Sivachenko,et al.  Data mining in protein interactomics , 2005, IEEE Engineering in Medicine and Biology Magazine.

[67]  Ron Y. Pinter,et al.  Alignment of metabolic pathways , 2005, Bioinform..

[68]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[69]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[70]  Michael Lässig,et al.  From protein interactions to functional annotation: graph alignment in Herpes , 2007, BMC Systems Biology.

[71]  P. Uetz,et al.  From protein networks to biological systems , 2005, FEBS letters.

[72]  B. Bollobás The evolution of random graphs , 1984 .

[73]  Patrick Lambrix,et al.  Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX , 2005, Bioinform..

[74]  L. Holm,et al.  Unraveling protein interaction networks with near-optimal efficiency , 2004, Nature Biotechnology.

[75]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[76]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[77]  S. Elledge,et al.  Requirement of ATM-dependent phosphorylation of brca1 in the DNA damage response to double-strand breaks. , 1999, Science.

[78]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[79]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[80]  M. A. Muñoz,et al.  Scale-free networks from varying vertex intrinsic fitness. , 2002, Physical review letters.

[81]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[82]  T. Hughes,et al.  Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. , 2000, Science.

[83]  Erich E. Wanker,et al.  UniHI: an entry gate to the human protein interactome , 2006, Nucleic Acids Res..

[84]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[85]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[86]  Thomas Pfeiffer,et al.  Exploring the pathway structure of metabolism: decomposition into subnetworks and application to Mycoplasma pneumoniae , 2002, Bioinform..

[87]  Jake Y Chen,et al.  Data mining in protein interactomics. Six computational research challenges and opportunities. , 2005, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[88]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[89]  Dennis Shasha,et al.  NetMatch : a Cytoscape plugin for searching biological networks , 2006 .

[90]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[91]  Robert B. Russell,et al.  InterPreTS: protein Interaction Prediction through Tertiary Structure , 2003, Bioinform..

[92]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[93]  Shi-Hua Zhang,et al.  Biomolecular network querying: a promising approach in systems biology , 2008, BMC Systems Biology.

[94]  Yves Deville,et al.  An overview of data models for the analysis of biochemical pathways , 2003, Briefings Bioinform..

[95]  M. Tyers,et al.  Osprey: a network visualization system , 2003, Genome Biology.

[96]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[97]  M. Golumbic Chapter 3 - Perfect graphs , 2004 .

[98]  Natasa Przulj,et al.  Modelling protein–protein interaction networks via a stickiness index , 2006, Journal of The Royal Society Interface.

[99]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..