A branch-and-cut algorithm for multiple sequence alignment

We consider a branch-and-cut approach for solving the multiple sequence alignment problem, which is a central problem in computational biology. We propose a general model for this problem in which arbitrary gap costs are allowed. An interesting aspect of our approach is that the three (exponentially large) classes of natural valid inequalities that we consider turn out to be both facet-defining for the convex hull of integer solutions and separable in polynomial time. Both the proofs that these classes of valid inequalities are facet-defining and the description of the separation algorithms are far from trivial. Experimental results on several benchmark instances show that our method outperforms the best tools developed so far, in that it produces alignments that are better from a biological point of view. A noteworthy outcome of the results is the effectiveness of using branch-and-cut with only a carefully-selected subset of the variables as a heuristic.

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[3]  Robert D. Carr,et al.  Compact optimization can outperform separation: A case study in structural proteomics , 2004, 4OR.

[4]  Kurt Mehlhorn,et al.  A branch-and-cut algorithm for multiple sequence alignment , 1997, RECOMB '97.

[5]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[6]  David Eppstein,et al.  Sequence Comparison with Mixed Convex and Concave Costs , 1990, J. Algorithms.

[7]  Peter L. Hammer,et al.  Facet of regular 0–1 polytopes , 1975, Math. Program..

[8]  M. Golumbic Algorithmic graph theory and perfect graphs , 1980 .

[9]  M. Golumbic CHAPTER 3 – Perfect Graphs , 1980 .

[10]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[11]  Matteo Fischetti,et al.  A Polyhedral Approach to the Asymmetric Traveling Salesman Problem , 1997 .

[12]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[13]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[14]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[15]  Stefan Thienel,et al.  ABACUS - a branch-and-CUt system , 1995 .

[16]  Knut Reinert,et al.  A polyhedral approach to sequence alignment problems , 2000, Discret. Appl. Math..

[17]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[18]  Michael Jünger,et al.  SCIL - Symbolic Constraints in Integer Linear Programming , 2002, ESA.

[19]  Tobias Achterberg,et al.  SCIP - a framework to integrate Constraint and Mixed Integer Programming , 2004 .

[20]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[21]  Pavel A. Pevzner,et al.  Generalized Sequence Alignment and Duality , 1993 .

[22]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[23]  Hans-Peter Lenhof,et al.  An exact solution for the Segment-to-Segment multiple sequence alignment problem , 1998, German Conference on Bioinformatics.

[24]  Kurt Mehlhorn,et al.  LEDA: A Library of Efficient Data Types and Algorithms , 1990, ICALP.

[25]  Knut Reinert,et al.  The Practical Use of the A* Algorithm for Exact Multiple Sequence Alignment , 2000, J. Comput. Biol..

[26]  Kurt Mehlhorn,et al.  LEDA: a platform for combinatorial and geometric computing , 1997, CACM.

[27]  Jens Stoye,et al.  An iterative method for faster sum-of-pairs multiple sequence alignment , 2000, Bioinform..

[28]  J. Kececioglu Exact and approximation algorithms for DNA sequence reconstruction , 1992 .

[29]  R. Kipp Martin,et al.  Using separation algorithms to generate mixed integer model reformulations , 1991, Oper. Res. Lett..

[30]  Baruch Schieber,et al.  On-line dynamic programming with applications to the prediction of RNA secondary structure , 1991, SODA '90.

[31]  Leslie E. Trotter,et al.  Properties of vertex packing and independence system polyhedra , 1974, Math. Program..

[32]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[33]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[34]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Robert D. Carr,et al.  Compact vs. exponential-size LP relaxations , 2000, Oper. Res. Lett..

[36]  Daniel Bienstock,et al.  Potential Function Methods for Approximately Solving Linear Programming Problems: Theory and Practice , 2002 .