Graph algorithms for biological systems analysis

The post-genomic era has witnessed an explosion in the quality, quantity and variety of biological data---sequence, structure, and networks. However, when building computational models on these data, some abstractions recur often. In particular, graph-based computational models are a powerful, flexible and efficient way of modeling many biological systems. Graph models are used in systems biology where the goal is to understand relationships among biological entities, and in structural bioinformatics where a graph is used to represent the amino acid (or atom) interaction relationships in a protein or the secondary structure base-pairing relationships in RNA. For many of these problems, we can develop algorithms that explore the fact that certain key parameters have complexity dependent on the treewidth of the system, which is typically very small for a variety of biological systems. When treewidth is large, we can still use spectral methods to find biologically sound solutions in an efficient manner.

[1]  Stefan Balev Solving the Protein Threading Problem by Lagrangian Relaxation , 2004, WABI.

[2]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[3]  F. Chung,et al.  Spectra of random graphs with given expected degrees , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  N. Alexandrov,et al.  SARFing the PDB. , 1996, Protein engineering.

[7]  Antal F. Novak,et al.  networks Græmlin : General and robust alignment of multiple large interaction data , 2006 .

[8]  Satoru Miyano,et al.  On the Approximation of Protein Threading , 1999, Theor. Comput. Sci..

[9]  Robert D. Carr,et al.  101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem , 2001, RECOMB.

[10]  Gary L. Miller,et al.  Separators for sphere-packings and nearest neighbor graphs , 1997, JACM.

[11]  Russell L. Malmberg,et al.  Rapid ab initio RNA Folding Including Pseudoknots Via Graph Tree Decomposition , 2006, WABI.

[12]  Bonnie Berger,et al.  A tree-decomposition approach to protein structure prediction , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[13]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[14]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[15]  S. Bryant,et al.  Threading a database of protein cores , 1995, Proteins.

[16]  Bin Fu,et al.  Sublinear Time Width-Bounded Separators and Their Application to the Protein Side-Chain Packing Problem , 2006, AAIM.

[17]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[18]  Bonnie Berger,et al.  Fast and accurate algorithms for protein side-chain packing , 2006, JACM.

[19]  Yanjun Qi,et al.  Random Forest Similarity for Protein-Protein Interaction Prediction from Multiple Sources , 2004, Pacific Symposium on Biocomputing.

[20]  Niles A Pierce,et al.  Protein design is NP-hard. , 2002, Protein engineering.

[21]  Gordon S. Rule,et al.  Rapid Protein Structure Detection and Assignment using Residual Dipolar Couplings , 2002 .

[22]  Sourav Bandyopadhyay,et al.  Systematic identification of functional orthologs based on protein network comparison. , 2006, Genome research.

[23]  Ying Xu,et al.  Protein structure prediction using sparse dipolar coupling data. , 2004, Nucleic acids research.

[24]  Bonnie Berger,et al.  Struct2Net: Integrating Structure into Protein-Protein Interaction Prediction , 2005, Pacific Symposium on Biocomputing.

[25]  Ron Shamir,et al.  A Probabilistic Methodology for Integrating Knowledge and Experiments on Biological Networks , 2006, J. Comput. Biol..

[26]  Temple F. Smith,et al.  Global optimum protein threading with gapped alignment and empirical pair score functions. , 1996, Journal of molecular biology.

[27]  R. Lathrop The protein threading problem with sequence amino acid interaction preferences is NP-complete. , 1994, Protein engineering.

[28]  Bonnie Berger,et al.  Global Alignment of Multiple Protein Interaction Networks , 2008, Pacific Symposium on Biocomputing.

[29]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[30]  Bonnie Berger,et al.  Pairwise Global Alignment of Protein Interaction Networks by Matching Neighborhood Topology , 2007, RECOMB.

[31]  Adrian A Canutescu,et al.  Access the most recent version at doi: 10.1110/ps.03154503 References , 2003 .

[32]  William Stafford Noble,et al.  Large-scale identification of yeast integral membrane protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Serafim Batzoglou,et al.  Integrated Protein Interaction Networks for 11 Microbes , 2006, RECOMB.

[34]  Bonnie Berger,et al.  A Parameterized Algorithm for Protein Structure Alignment , 2007, J. Comput. Biol..

[35]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[36]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[37]  Jinbo Xu Solving the contact map overlap problem via tree decomposition and a DEE-like pruning strategy , 2007, 2007 46th IEEE Conference on Decision and Control.

[38]  Bo Yan,et al.  Fast De novo Peptide Sequencing and Spectral Alignment via Tree Decomposition , 2006, Pacific Symposium on Biocomputing.

[39]  Xin Yuan,et al.  Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins , 2005, Bioinform..

[40]  A. Godzik The structural alignment between two proteins: Is there a unique answer? , 1996, Protein science : a publication of the Protein Society.

[41]  Ying Xu,et al.  Raptor: Optimal Protein Threading by Linear Programming , 2003, J. Bioinform. Comput. Biol..

[42]  Bonnie Berger,et al.  Probabilistic Modeling of Systematic Errors in Two-Hybrid Experiments , 2006, Pacific Symposium on Biocomputing.

[43]  Ying Xu,et al.  A Computational Method for NMR-Constrained Protein Threading , 2000, J. Comput. Biol..

[44]  Tatsuya Akutsu NP-Hardness Results for Protein Side-chain Packing , 1997 .

[45]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[46]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Martin Vingron,et al.  A joint model of regulatory and metabolic networks , 2006, BMC Bioinformatics.

[48]  Russell L. Malmberg,et al.  Tree decomposition based fast search of RNA structures including pseudoknots in genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[49]  Roded Sharan,et al.  QNet: A Tool for Querying Protein Interaction Networks , 2007, RECOMB.

[50]  A. Godzik,et al.  Topology fingerprint approach to the inverse protein folding problem. , 1992, Journal of molecular biology.

[51]  Gene H. Golub,et al.  Matrix computations , 1983 .

[52]  Jens Meiler,et al.  DipoCoup: A versatile program for 3D-structure homology comparison based on residual dipolar couplings and pseudocontact shifts , 2000, Journal of biomolecular NMR.

[53]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.