Speeding Up Pairwise Sequence Alignments: A Scoring Scheme Reweighting Based Approach

A general technique based on scoring scheme reweighting is proposed that can be used to speed up dynamic programming algorithms for a variety of pairwise sequence alignment problems. For the standard sequence alignment problem with an arbitrary gap penalty function, we show that a reweighted scoring scheme can be obtained by an efficient preprocessing step that computes a set of upper bounds on the score of the optimal alignment between pairs of suffixes of the sequences. A series of experiments on synthetic sequences and biological sequences indicate that our algorithm offers significant and robust speedup over the standard cubic-time dynamic programming algorithm. For sequences of length up to 2000 used in our experiments, the speedup factor ranges from 4 to more than 50. With a strong upper bound, a sub-cubic behavior in running time is also observed for all the tested situations.

[1]  David Eppstein,et al.  Sparse dynamic programming II: convex and concave cost functions , 1992, JACM.

[2]  Alfredo De Santis,et al.  A simple algorithm for the constrained sequence problems , 2004, Information Processing Letters.

[3]  David Eppstein,et al.  Sequence Comparison with Mixed Convex and Concave Costs , 1990, J. Algorithms.

[4]  Peter J. Stuckey,et al.  Optimal Sum-of-Pairs Multiple Sequence Alignment Using Incremental Carrillo and Lipman Bounds , 2006, J. Comput. Biol..

[5]  Norbert Blum Speeding Up Dynamic Programming without Omitting any Optimal Solution and Some Applications in Molecular Biology , 2000, J. Algorithms.

[6]  Richard Mott Local sequence alignments with monotonic gap penalties , 1999, Bioinform..

[7]  Sandeep K. Gupta,et al.  Improving the Practical Space and Time Efficiency of the Shortest-Paths Approach to Sum-of-Pairs Multiple Sequence Alignment , 1995, J. Comput. Biol..

[8]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[9]  Temple F. Smith,et al.  Rapid dynamic programming algorithms for RNA secondary structure , 1986 .

[10]  Knut Reinert,et al.  The Practical Use of the A* Algorithm for Exact Multiple Sequence Alignment , 2000, J. Comput. Biol..

[11]  Yang Wang,et al.  A space-efficient algorithm for sequence alignment with inversions and reversals , 2004, Theor. Comput. Sci..

[12]  Andrej Sali,et al.  Variable gap penalty for protein sequence-structure alignment. , 2006, Protein engineering, design & selection : PEDS.

[13]  M S Waterman,et al.  Sequence alignment and penalty choice. Review of concepts, case studies and implications. , 1994, Journal of molecular biology.

[14]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[15]  S F Altschul,et al.  Generalized affine gap costs for protein sequence alignment , 1998, Proteins.