Pairwise Alignment of Protein Interaction Networks

With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost.

[1]  Fan Chung Graham,et al.  The Spectra of Random Graphs with Given Expected Degrees , 2004, Internet Math..

[2]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[3]  R. Huber,et al.  Structure of 20S proteasome from yeast at 2.4Å resolution , 1997, Nature.

[4]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[5]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[6]  Noa Reis,et al.  Subunit interaction maps for the regulatory particle of the 26S proteasome and the COP9 signalosome , 2001, The EMBO journal.

[7]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[9]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[10]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[11]  F. Chung,et al.  Spectra of random graphs with given expected degrees , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  M. Cyert,et al.  Genetic analysis of calmodulin and its targets in Saccharomyces cerevisiae. , 2001, Annual review of genetics.

[13]  Z N Oltvai,et al.  Evolutionary conservation of motif constituents in the yeast protein interaction network , 2003, Nature Genetics.

[14]  H. Mewes,et al.  Functional modules by relating protein interaction networks and gene expression. , 2003, Nucleic acids research.

[15]  P. Uetz,et al.  What do we learn from high-throughput protein interaction data? , 2004, Expert review of proteomics.

[16]  A. Wagner How the global structure of protein interaction networks evolves , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[17]  Robert Gentleman,et al.  Local modeling of global interactome networks , 2005 .

[18]  A. Vespignani,et al.  Modeling of Protein Interaction Networks , 2001, Complexus.

[19]  Joel S. Bader,et al.  Greedily building protein networks with confidence , 2003, Bioinform..

[20]  S. Wuchty Evolution and topology in the yeast protein interaction network. , 2004, Genome research.

[21]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[22]  Wojciech Szpankowski,et al.  Pairwise Local Alignment of Protein Interaction Networks Guided by Models of Evolution , 2005, RECOMB.

[23]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[24]  Victor Kunin,et al.  Functional evolution of the yeast protein interaction network. , 2004, Molecular biology and evolution.

[25]  B. Ason,et al.  A high-throughput assay for Tn5 Tnp-induced DNA cleavage. , 2004, Nucleic acids research.

[26]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[27]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[28]  Hideo Matsuda,et al.  A Multiple Alignment Algorithm for Metabolic Pathway Analysis Using Enzyme Hierarchy , 2000, ISMB.

[29]  Wojciech Szpankowski,et al.  Assessing Significance of Connectivity and Conservation in Protein Interaction Networks , 2006, RECOMB.

[30]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[31]  R. Solé,et al.  Evolving protein interaction networks through gene duplication. , 2003, Journal of theoretical biology.

[32]  Andreas Wagner,et al.  A statistical framework for combining and interpreting proteomic datasets , 2004, Bioinform..

[33]  Wojciech Szpankowski,et al.  Assessing Significance of Connectivity and Conservation in Protein Interaction Networks , 2007, J. Comput. Biol..

[34]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[35]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[36]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[37]  E. Levanon,et al.  Preferential attachment in the protein network evolution. , 2003, Physical review letters.

[38]  J. Massagué TGF-beta signal transduction. , 1998, Annual review of biochemistry.

[39]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[40]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[41]  R. Milo,et al.  Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[43]  Ron Y. Pinter,et al.  Alignment of metabolic pathways , 2005, Bioinform..

[44]  M. Narayan CE , 1988 .

[45]  A. Wagner,et al.  Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications , 2002, BMC Evolutionary Biology.

[46]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[47]  Fan Chung Graham,et al.  Duplication Models for Biological Networks , 2002, J. Comput. Biol..

[48]  John M. Walker,et al.  C. elegans , 2006, Methods in Molecular Biology.

[49]  A. E. Hirsh,et al.  Evolutionary Rate in the Protein Interaction Network , 2002, Science.

[50]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[51]  Wen-Hsiung Li,et al.  Evolution of the yeast protein interaction network , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[53]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[54]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[55]  Roded Sharan,et al.  Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data , 2004, J. Comput. Biol..

[56]  B. Snel,et al.  Pathway alignment: application to the comparative analysis of glycolytic enzymes. , 1999, The Biochemical journal.

[57]  Francis D. Gibbons,et al.  Predicting protein complex membership using probabilistic network reliability. , 2004, Genome research.

[58]  Roded Sharan,et al.  Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data , 2005, J. Comput. Biol..

[59]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.