A simple and space-efficient fragment-chaining algorithm for alignment of DNA and protein sequences

Abstract In the segment-based approach to sequence alignment, nucleic acid, and protein sequence alignments are constructed from fragments, i.e., from pairs of ungapped segments of the input sequences. Given a set F of candidate fragments and a weighting function w : F → R+0, the score of an alignment is defined as the sum of weights of the fragments it consists of, and the optimization problem is to find a consistent collection of pairwise disjoint fragments with maximum sum of weights. Herein, a sparse dynamic programming algorithm is described that solves the pairwise segment-alignment problem in O(L + Nmax) space where L is the maximum length of the input sequences while Nmax ≤ #F holds. With a recently introduced weighting function w, small sets F of candidate fragments are sufficient to obtain alignments of high quality. As a result, the proposed algorithm runs in essentially linear space.

[1]  Knut Reinert,et al.  A polyhedral approach to sequence alignment problems , 2000, Discret. Appl. Math..

[2]  A. Dress,et al.  Multiple DNA and protein sequence alignment based on segment-to-segment comparison. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Hans-Peter Lenhof,et al.  An exact solution for the Segment-to-Segment multiple sequence alignment problem , 1998, German Conference on Bioinformatics.

[4]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[5]  David Eppstein,et al.  Sparse dynamic programming I: linear cost functions , 1992, JACM.

[6]  D. Lipman,et al.  Rapid similarity searches of nucleic acid and protein data banks. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[8]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[9]  Burkhard Morgenstern,et al.  Speeding Up the DIALIGN Multiple Alignment Program by Using the 'Greedy Alignment of BIOlogical Sequences LIBrary' (GABIOS-LIB) , 2000, JOBIM.

[10]  Burkhard Morgenstern,et al.  A space-efficient algorithm for aligning large genomic sequences , 2000, Bioinform..

[11]  J. Stoye,et al.  Consistent Equivalence Relations: A Set-Theoretical Framework for Multiple Sequence Alignment , 1999 .

[12]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[13]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.