Automatic Parameter Learning for Multiple Local Network Alignment

We developed Graemlin 2.0, a new multiple network aligner with (1) a new multi-stage approach to local network alignment; (2) a novel scoring function that can use arbitrary features of a multiple network alignment, such as protein deletions, protein duplications, protein mutations, and interaction losses; (3) a parameter learning algorithm that uses a training set of known network alignments to learn parameters for our scoring function and thereby adapt it to any set of networks; and (4) an algorithm that uses our scoring function to find approximate multiple network alignments in linear time. We tested Graemlin 2.0's accuracy on protein interaction networks from IntAct, DIP, and the Stanford Network Database. We show that, on each of these datasets, Graemlin 2.0 has higher sensitivity and specificity than existing network aligners. Graemlin 2.0 is available under the GNU public license at http://graemlin.stanford.edu .

[1]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[2]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[3]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[4]  Serafim Batzoglou,et al.  CONTRAfold: RNA secondary structure prediction without physics-based models , 2006, ISMB.

[5]  Shi-Hua Zhang,et al.  Alignment of molecular networks by integer quadratic programming , 2007, Bioinform..

[6]  P. Bork,et al.  Identification and analysis of evolutionarily cohesive functional modules in protein networks. , 2006, Genome research.

[7]  P. Uetz,et al.  From protein networks to biological systems , 2005, FEBS letters.

[8]  J. Felsenstein Maximum-likelihood estimation of evolutionary trees from continuous characters. , 1973, American journal of human genetics.

[9]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10]  Roded Sharan,et al.  Fast and Accurate Alignment of Multiple Protein Networks , 2009, J. Comput. Biol..

[11]  T. Ideker,et al.  Modeling cellular machinery through biological network comparison , 2006, Nature Biotechnology.

[12]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[13]  Serafim Batzoglou,et al.  CONTRAlign: Discriminative Training for Protein Sequence Alignment , 2006, RECOMB.

[14]  Qiong Cheng,et al.  Combinatorial Optimization Algorithms for Metabolic Networks Alignments and Their Applications , 2011, Int. J. Knowl. Discov. Bioinform..

[15]  Serafim Batzoglou,et al.  Integrated Protein Interaction Networks for 11 Microbes , 2006, RECOMB.

[16]  Sarah A Teichmann,et al.  The origins and evolution of functional modules: lessons from protein complexes , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[17]  Roded Sharan,et al.  Identification of Protein Complexes by Comparative Analysis of Yeast and Bacterial Protein Interaction Data , 2005, J. Comput. Biol..

[18]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[19]  T. Ideker,et al.  Systematic interpretation of genetic interactions using protein networks , 2005, Nature Biotechnology.

[20]  M. Vidal,et al.  Interactome: gateway into systems biology. , 2005, Human molecular genetics.

[21]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[22]  Johannes Berg,et al.  Cross-species analysis of biological networks by Bayesian alignment. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[23]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[24]  S. Sheather Density Estimation , 2004 .

[25]  Ron Y. Pinter,et al.  Alignment of metabolic pathways , 2005, Bioinform..

[26]  Nigam H. Shah,et al.  Current progress in network research: toward reference networks for key model organisms , 2007, Briefings Bioinform..

[27]  Michael Lässig,et al.  Local graph alignment and motif search in biological networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Wojciech Szpankowski,et al.  Pairwise Alignment of Protein Interaction Networks , 2006, J. Comput. Biol..

[29]  Hongyu Zhao,et al.  COSINE: COndition-SpecIfic sub-NEtwork identification using a global optimization method , 2011, Bioinform..

[30]  Robert Tibshirani,et al.  Boolean implication networks derived from large scale, whole genome microarray datasets , 2008, Genome Biology.

[31]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[32]  Serafim Batzoglou,et al.  Automatic Parameter Learning for Multiple Network Alignment , 2008, RECOMB.

[33]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[34]  R. Sharan,et al.  Transcriptional regulation of protein complexes within and across species , 2007, Proceedings of the National Academy of Sciences.

[35]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..

[36]  S. L. Wong,et al.  Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network , 2005, Journal of biology.

[37]  Antal F. Novak,et al.  networks Græmlin : General and robust alignment of multiple large interaction data , 2006 .

[38]  Roded Sharan,et al.  Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data , 2004, J. Comput. Biol..

[39]  R. Karp,et al.  Conserved pathways within bacteria and yeast as revealed by global protein network alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  M. Vidal,et al.  Literature-curated protein interaction datasets , 2009, Nature Methods.

[41]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[42]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[43]  Xiaoning Qian,et al.  Enhancing the accuracy of HMM-based conserved pathway prediction using global correspondence scores , 2011, BMC Bioinformatics.

[44]  Jerzy Tiuryn,et al.  Identification of functional modules from conserved ancestral protein-protein interactions , 2007, ISMB/ECCB.

[45]  R. Shamir,et al.  From E-MAPs to module maps: dissecting quantitative genetic interactions using physical interactions , 2008, Molecular systems biology.

[46]  Roded Sharan,et al.  Identification of conserved protein complexes based on a model of protein network evolution , 2007, Bioinform..

[47]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.