Optimal network Alignment with Graphlet Degree Vectors

Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology.

[1]  M Levitt,et al.  Different protein sequences can give rise to highly similar folds through different stabilizing interactions , 1994, Protein science : a publication of the Protein Society.

[2]  Michael Lappe,et al.  Optimized Null Model for Protein Structure Networks , 2009, PloS one.

[3]  Andrea Barta,et al.  Evolutionary conservation of minor U12-type spliceosome between plants and humans. , 2005, RNA.

[4]  K. Gunsalus,et al.  Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome network , 2009, Nature Methods.

[5]  K. Schulten,et al.  Phylogenetic Analysis of Metabolic Pathways , 2001, Journal of Molecular Evolution.

[6]  J. Palmer,et al.  Evidence from beta-tubulin phylogeny that microsporidia evolved from within the fungi. , 2000, Molecular biology and evolution.

[7]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[8]  Pamela A. Silver,et al.  Functional Specificity among Ribosomal Proteins Regulates Gene Expression , 2007, Cell.

[9]  Elizabeth Pennisi,et al.  Modernizing the Tree of Life , 2003, Science.

[10]  Bonnie Berger,et al.  Local Optimization for Global Alignment of Protein Interaction Networks , 2010, Pacific Symposium on Biocomputing.

[11]  Francis Bach,et al.  Global alignment of protein–protein interaction networks by graph matching methods , 2009, Bioinform..

[12]  Meng Xu,et al.  NetAlign: a web-based tool for comparison of protein interaction networks , 2006, Bioinform..

[13]  Derek Huntley,et al.  Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks , 2005, BMC Evolutionary Biology.

[14]  M. Bernardine Dias,et al.  The Dynamic Hungarian Algorithm for the Assignment Problem with Changing Costs , 2007 .

[15]  R. Gutell,et al.  Detailed analysis of the higher-order structure of 16S-like ribosomal ribonucleic acids. , 1983, Microbiological reviews.

[16]  Michael Lässig,et al.  Local graph alignment and motif search in biological networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[18]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[19]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[20]  Ambuj K. Singh,et al.  Deriving phylogenetic trees from the similarity analysis of metabolic pathways , 2003, ISMB.

[21]  Giovanni Widmer,et al.  Differential evolution of repetitive sequences in Cryptosporidium parvum and Cryptosporidium hominis. , 2006, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[22]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[23]  Serafim Batzoglou,et al.  Automatic Parameter Learning for Multiple Network Alignment , 2008, RECOMB.

[24]  Runsheng Chen,et al.  Phylophenetic properties of metabolic pathway topologies as revealed by global analysis , 2006, BMC Bioinformatics.

[25]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[26]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[27]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[28]  L. Hood,et al.  A Genomic Regulatory Network for Development , 2002, Science.

[29]  Antal F. Novak,et al.  networks Græmlin : General and robust alignment of multiple large interaction data , 2006 .

[30]  Gregory A. Buck,et al.  The genome of Cryptosporidium hominis , 2004, Nature.

[31]  Johannes Berg,et al.  Cross-species analysis of biological networks by Bayesian alignment. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[33]  Michael J E Sternberg,et al.  The identification of similarities between biological networks: application to the metabolome and interactome. , 2007, Journal of molecular biology.

[34]  J. Potashkin,et al.  The evolutionary conservation of the splicing apparatus between fission yeast and man. , 1995, Nucleic acids symposium series.

[35]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[36]  P. Radivojac,et al.  An integrated approach to inferring gene–disease associations in humans , 2008, Proteins.

[37]  R. Kolodny,et al.  Sequence-similar, structure-dissimilar protein pairs in the PDB , 2007, Proteins.

[38]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[39]  Tijana Milenkovic,et al.  GraphCrunch: A tool for large network analyses , 2008, BMC Bioinformatics.

[40]  Lesley Collins,et al.  Complex spliceosomal organization ancestral to extant eukaryotes. , 2005, Molecular biology and evolution.

[41]  S. Brenner,et al.  The structure of the nervous system of the nematode Caenorhabditis elegans. , 1986, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[42]  O. Kuchaiev,et al.  Topological network alignment uncovers biological function and phylogeny , 2008, Journal of The Royal Society Interface.

[43]  Bonnie Berger,et al.  Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology , 2007, RECOMB.

[44]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[45]  Maryse Condé Tree of Life , 1992 .

[46]  Andrej Lupták,et al.  Widespread Occurrence of Self-Cleaving Ribozymes , 2009, Science.

[47]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[48]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[49]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[50]  Rodrigo Lopez,et al.  Web Services at the European Bioinformatics Institute , 2007, Nucleic Acids Res..

[51]  A. Vespignani,et al.  Modeling of Protein Interaction Networks , 2001, Complexus.

[52]  T. Ito,et al.  Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[54]  Khalid Sayood,et al.  A new sequence distance measure for phylogenetic tree construction , 2003, Bioinform..

[55]  G. Mulcahy,et al.  Interaction of Cryptosporidium hominis and Cryptosporidium parvum with Primary Human and Bovine Intestinal Cells , 2006, Infection and Immunity.

[56]  T. Sittler,et al.  The Plasmodium protein network diverges from those of other eukaryotes , 2005, Nature.

[57]  Stephen A. Cook,et al.  The complexity of theorem-proving procedures , 1971, STOC.

[58]  D. West Introduction to Graph Theory , 1995 .

[59]  Aleksandar Stevanovic,et al.  Geometric Evolutionary Dynamics of Protein Interaction Networks , 2010, Pacific Symposium on Biocomputing.

[60]  A. Barabasi,et al.  An empirical framework for binary interactome mapping , 2008, Nature Methods.

[61]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[62]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[63]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[64]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[65]  Daniel Gautheret,et al.  Pattern searching/alignment with RNA primary and secondary structures: an effective descriptor for tRNA , 1990, Comput. Appl. Biosci..