Opportunities for Combinatorial Optimization in Computational Biology

This is a survey designed for mathematical programming people who do not know molecular biology and want to learn the kinds of combinatorial optimization problems that arise. After a brief introduction to the biology, we present optimization models pertaining to sequencing, evolutionary explanations, structure prediction, and recognition. Additional biology is given in the context of the problems, including some motivation for disease diagnosis and drug discovery. Open problems are cited with an extensive bibliography, and we offer a guide to getting started in this exciting frontier.

[1]  George L. Nemhauser,et al.  A polynomial algorithm for the minimum weighted clique cover problem on claw-free perfect graphs , 1982, Discret. Math..

[2]  Alberto Caprara,et al.  Sorting Permutations by Reversals and Eulerian Cycle Decompositions , 1999, SIAM J. Discret. Math..

[3]  Joe Marks,et al.  Computational Complexity, Protein Structure Prediction, and the Levinthal Paradox , 1994 .

[4]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[5]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[6]  Pavel A. Pevzner,et al.  Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals , 1995, JACM.

[7]  Tao Jiang,et al.  Aligning sequences via an evolutionary tree: complexity and approximation , 1994, STOC '94.

[8]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[9]  Giorgio Gambosi,et al.  Complexity and Approximation , 1999, Springer Berlin Heidelberg.

[10]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[11]  William E. Hart,et al.  Lattice and Off-Lattice Side Chain Models of Protein Folding: Linear Time Structure Prediction Better than 86% of Optimal , 1997, J. Comput. Biol..

[12]  Tao Jiang,et al.  Computational Methods for Docking and Applications to Drug Design: Functional Epitopes and Combinatorial Libraries , 2002 .

[13]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[14]  André van der Hoek,et al.  Global optimization methods for protein folding problems , 1995, Global Minimization of Nonconvex Energy Functions: Molecular Conformation and Protein Folding.

[15]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[16]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[17]  T. Ideker,et al.  A new approach to decoding life: systems biology. , 2001, Annual review of genomics and human genetics.

[18]  Frank Thomson Leighton,et al.  Protein folding in the hydrophobic-hydrophilic (HP) is NP-complete , 1998, RECOMB '98.

[19]  Alberto Caprara,et al.  Experimental and Statistical Analysis of Sorting by Reversals , 2000 .

[20]  Arnold Neumaier,et al.  Molecular Modeling of Proteins and Mathematical Prediction of Protein Structure , 1997, SIAM Rev..

[21]  László Babai,et al.  Local expansion of vertex-transitive graphs and random generation in finite groups , 1991, STOC '91.

[22]  David A. Bader,et al.  A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study , 2001, J. Comput. Biol..

[23]  D. G. Eld A Practical Algorithm for Optimal Inference of Haplotypes from Diploid Populations , 2000 .

[24]  H A Scheraga,et al.  Improved genetic algorithm for the protein folding problem by use of a Cartesian combination operator , 1996, Protein science : a publication of the Protein Society.

[25]  Erik D. Goodman,et al.  A Standard GA Approach to Native Protein Conformation Prediction , 1995 .

[26]  Malgorzata Sterna,et al.  Selected combinatorial optimization problem arising in molecular biology , 1996 .

[27]  K. Dill Theory for the folding and stability of globular proteins. , 1985, Biochemistry.

[28]  D. Yee,et al.  Principles of protein folding — A perspective from simple exact models , 1995, Protein science : a publication of the Protein Society.

[29]  Peter V. Coveney,et al.  Protein Structure Prediction as a Hard Optimization Problem: The Genetic Algorithm Approach , 1997, physics/9708012.

[30]  Vasant Honavar,et al.  Discovering Protein Function Classification Rules from Reduced Alphabet Representations of Protein Sequences , 2002, JCIS.

[31]  Vineet Bafna,et al.  Genome Rearrangements and Sorting by Reversals , 1996, SIAM J. Comput..

[32]  R. Ravi,et al.  Of mice and men: algorithms for evolutionary distances between genomes with translocation , 1995, SODA '95.

[33]  G. Fuellen A Gentle Guide to Multiple Alignment , 1997 .

[34]  Russell Schwartz,et al.  Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem , 2002, Briefings Bioinform..

[35]  Giuseppe Lancia,et al.  Optimization Problems in Computational Molecular Biology , 2002 .

[36]  Giuseppe Lancia,et al.  Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem , 2002, WABI.

[37]  Lawrence Hunter,et al.  Molecular biology for computer scientists , 1993 .

[38]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[39]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[40]  Eugene L. Lawler,et al.  Approximation Algorithms for Multiple Sequence Alignment , 1994, CPM.

[41]  R Unger,et al.  Genetic algorithms for protein folding simulations. , 1992, Journal of molecular biology.

[42]  William E. Hart,et al.  On the Intractability of Protein Folding with a Finite Alphabet of Amino Acids , 1999, Algorithmica.

[43]  Rolf Backofen,et al.  COMPUTATIONAL MOLECULAR BIOLOGY: AN INTRODUCTION , 2000 .

[44]  K Yue,et al.  Forces of tertiary structural organization in globular proteins. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[45]  David Sankoff,et al.  Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement , 1995, Algorithmica.

[46]  William E. Hart,et al.  Robust Proofs of NP-Hardness for Protein Folding: General Lattices and Energy Potentials , 1997, J. Comput. Biol..

[47]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[48]  R. Karp Mathematical Challenges from Genomics and Molecular Biology , 2002 .

[49]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[50]  Marek Karpinski,et al.  1.375-Approximation Algorithm for Sorting by Reversals , 2002, ESA.

[51]  M. Golumbic Algorithmic graph theory and perfect graphs , 1980 .

[52]  M. Blaxter,et al.  Discovering Genomics, Proteomics and Bioinformatics by A. Malcolm Campbell & Laurie J. Heyer , 2003 .

[53]  D. Gusfield Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993 .

[54]  Martin Charles Golumbic,et al.  CHAPTER 8 – Interval Graphs , 1980 .

[55]  James M. Bower,et al.  Computational modeling of genetic and biochemical networks , 2001 .

[56]  K. Dill,et al.  A lattice statistical mechanics model of the conformational and sequence spaces of proteins , 1989 .

[57]  Alberto Caprara,et al.  On the Tightness of the Alternating-Cycle Lower Bound for Sorting by Reversals , 1999, J. Comb. Optim..

[58]  P. Pevzner,et al.  Computational Molecular Biology , 2000 .

[59]  Yue,et al.  Sequence-structure relationships in proteins and copolymers. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[60]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.

[61]  A. Clark,et al.  Inference of haplotypes from PCR-amplified samples of diploid populations. , 1990, Molecular biology and evolution.

[62]  Robert D. Carr,et al.  101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem , 2001, RECOMB.

[63]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[64]  D. Sankoff,et al.  Genomic divergence through gene rearrangement. , 1990, Methods in enzymology.

[65]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[66]  Alberto Caprara,et al.  Sorting Permutations by Reversals Through Branch-and-Price , 2001, INFORMS J. Comput..

[67]  Giancarlo Mauri,et al.  Application of Evolutionary Algorithms to Protein Folding Prediction , 1997, Artificial Evolution.

[68]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[69]  P. Berman,et al.  On Some Tighter Inapproximability Results , 1998, Electron. Colloquium Comput. Complex..

[70]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[71]  Vineet Bafna,et al.  Sorting by Transpositions , 1998, SIAM J. Discret. Math..

[72]  Haim Kaplan,et al.  Faster and simpler algorithm for sorting signed permutations by reversals , 1997, SODA '97.

[73]  Ron Unger,et al.  Genetic Algorithm for 3D Protein Folding Simulations , 1993, ICGA.

[74]  Tandy J. Warnow,et al.  Estimating true evolutionary distances between genomes , 2001, STOC '01.