Topological network alignment uncovers biological function and phylogeny

Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology and disease. Comparison and alignment of biological networks will probably have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein–protein interaction networks of two very different species—yeast and human—indicate that even distant species share a surprising amount of network topology, suggesting broad similarities in internal cellular wiring across all life on Earth.

[1]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[2]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[3]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  E. Webb Enzyme nomenclature 1992. Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the Nomenclature and Classification of Enzymes. , 1992 .

[6]  Maryse Condé Tree of Life , 1992 .

[7]  M Levitt,et al.  Different protein sequences can give rise to highly similar folds through different stabilizing interactions , 1994, Protein science : a publication of the Protein Society.

[8]  J. Potashkin,et al.  The evolutionary conservation of the splicing apparatus between fission yeast and man. , 1995, Nucleic acids symposium series.

[9]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[10]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[11]  M. Culbertson,et al.  Yeast Upf Proteins Required for RNA Surveillance Affect Global Expression of the Yeast Transcriptome , 1999, Molecular and Cellular Biology.

[12]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[13]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[14]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[15]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[16]  J. Kunz,et al.  FAP1, a homologue of human transcription factor NF‐X1, competes with rapamycin for binding to FKBP12 in yeast , 2000, Molecular microbiology.

[17]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[18]  J. Palmer,et al.  Evidence from beta-tubulin phylogeny that microsporidia evolved from within the fungi. , 2000, Molecular biology and evolution.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  K. Schulten,et al.  Phylogenetic Analysis of Metabolic Pathways , 2001, Journal of Molecular Evolution.

[22]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[23]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[25]  Ronald W. Davis,et al.  Systematic screen for human disease genes in yeast , 2002, Nature Genetics.

[26]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[27]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[28]  Ambuj K. Singh,et al.  Deriving phylogenetic trees from the similarity analysis of metabolic pathways , 2003, ISMB.

[29]  Elizabeth Pennisi,et al.  Modernizing the Tree of Life , 2003, Science.

[30]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[31]  Khalid Sayood,et al.  A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[32]  T. Isono,et al.  Molecular cloning and expression of uroplakins in transitional cell carcinoma. , 2003, Advances in experimental medicine and biology.

[33]  Derek Huntley,et al.  Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks , 2005, BMC Evolutionary Biology.

[34]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[35]  Gregory A. Buck,et al.  The genome of Cryptosporidium hominis , 2004, Nature.

[36]  Dipanwita Roy Chowdhury,et al.  Human protein reference database as a discovery resource for proteomics , 2004, Nucleic Acids Res..

[37]  Y. Matsubara,et al.  Mutation analysis of the MMAA and MMAB genes in Japanese patients with vitamin B(12)-responsive methylmalonic acidemia: identification of a prevalent MMAA mutation. , 2004, Molecular genetics and metabolism.

[38]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[39]  Michael Lässig,et al.  Local graph alignment and motif search in biological networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  B. Ason,et al.  A high-throughput assay for Tn5 Tnp-induced DNA cleavage. , 2004, Nucleic acids research.

[41]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[42]  Carsten Wiuf,et al.  Subnets of scale-free networks are not scale-free: sampling properties of networks. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Andrea Barta,et al.  Evolutionary conservation of minor U12-type spliceosome between plants and humans. , 2005, RNA.

[44]  Lesley Collins,et al.  Complex spliceosomal organization ancestral to extant eukaryotes. , 2005, Molecular biology and evolution.

[45]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[46]  B. Merinero,et al.  Genetic analysis of three genes causing isolated methylmalonic acidemia: identification of 21 novel allelic variants. , 2005, Molecular genetics and metabolism.

[47]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[48]  T. Sittler,et al.  The Plasmodium protein network diverges from those of other eukaryotes , 2005, Nature.

[49]  C. Enerbäck,et al.  Cluster analysis of S100 gene expression and genes correlating to psoriasin (S100A7) expression at different stages of breast cancer development. , 2005, International journal of oncology.

[50]  Runsheng Chen,et al.  Phylophenetic properties of metabolic pathway topologies as revealed by global analysis , 2006, BMC Bioinformatics.

[51]  M. Vidal,et al.  Effect of sampling on topology predictions of protein-protein interaction networks , 2005, Nature Biotechnology.

[52]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[53]  Johannes Berg,et al.  Cross-species analysis of biological networks by Bayesian alignment. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Igor Jurisica,et al.  Efficient estimation of graphlet frequency distributions in protein-protein interaction networks , 2006, Bioinform..

[55]  Natasa Przulj,et al.  Modelling protein–protein interaction networks via a stickiness index , 2006, Journal of The Royal Society Interface.

[56]  Carsten Wiuf,et al.  The effects of incomplete protein interaction data on structural and evolutionary inferences , 2006, BMC Biology.

[57]  Wojciech Szpankowski,et al.  Pairwise Alignment of Protein Interaction Networks , 2006, J. Comput. Biol..

[58]  Michael P. H. Stumpf,et al.  Generating confidence intervals on biological networks , 2007, BMC Bioinformatics.

[59]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[60]  Alessandro Vespignani,et al.  Detecting rich-club ordering in complex networks , 2006, physics/0602134.

[61]  Meng Xu,et al.  NetAlign: a web-based tool for comparison of protein interaction networks , 2006, Bioinform..

[62]  Giovanni Widmer,et al.  Differential evolution of repetitive sequences in Cryptosporidium parvum and Cryptosporidium hominis. , 2006, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[63]  Antal F. Novak,et al.  networks Græmlin : General and robust alignment of multiple large interaction data , 2006 .

[64]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[65]  C. Yeh,et al.  Comparative proteomic studies on the pathogenesis of human ulcerative colitis , 2006, Proteomics.

[66]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[67]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[68]  Pamela A. Silver,et al.  Functional Specificity among Ribosomal Proteins Regulates Gene Expression , 2007, Cell.

[69]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[70]  Hui Li,et al.  Functional evidence for a nasopharyngeal carcinoma-related gene BCAT1 located at 12p12. , 2007, Oncology research.

[71]  R. Luna,et al.  Different physiological relevance of yeast THO/TREX subunits in gene expression and genome integrity , 2008, Molecular Genetics and Genomics.

[72]  R. Guimerà,et al.  Classes of complex networks defined by role-to-role connectivity profiles. , 2007, Nature physics.

[73]  Tijana Milenkovic,et al.  GraphCrunch: A tool for large network analyses , 2008, BMC Bioinformatics.

[74]  Rodrigo Lopez,et al.  Web Services at the European Bioinformatics Institute , 2007, Nucleic Acids Res..

[75]  Bonnie Berger,et al.  Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology , 2007, RECOMB.

[76]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[77]  Serafim Batzoglou,et al.  Automatic Parameter Learning for Multiple Network Alignment , 2008, RECOMB.

[78]  P. Radivojac,et al.  An integrated approach to inferring gene–disease associations in humans , 2008, Proteins.

[79]  R. Kolodny,et al.  Sequence-similar, structure-dissimilar protein pairs in the PDB , 2007, Proteins.

[80]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[81]  Bonnie Berger,et al.  Global alignment of multiple protein interaction networks with application to functional orthology detection , 2008, Proceedings of the National Academy of Sciences.

[82]  Bonnie Berger,et al.  Global Alignment of Multiple Protein Interaction Networks , 2008, Pacific Symposium on Biocomputing.

[83]  Desmond J. Higham,et al.  Fitting a geometric graph to a protein-protein interaction network , 2008, Bioinform..

[84]  K. Gunsalus,et al.  Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome network , 2009, Nature Methods.

[85]  Zoran Nenadic,et al.  Structure of brain functional networks , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[86]  Michael Lappe,et al.  Optimized Null Model for Protein Structure Networks , 2009, PloS one.

[87]  Natasa Przulj,et al.  Learning the Structure of Protein-Protein Interaction Networks , 2009, Pacific Symposium on Biocomputing.

[88]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[89]  Francis Bach,et al.  Global alignment of protein–protein interaction networks by graph matching methods , 2009, Bioinform..

[90]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[91]  Tijana Milenkovic,et al.  Complementarity of network and sequence information in homologous proteins , 2010, J. Integr. Bioinform..

[92]  Aleksandar Stevanovic,et al.  Geometric Evolutionary Dynamics of Protein Interaction Networks , 2010, Pacific Symposium on Biocomputing.