论文信息 - Handling Updates of a Pairwise Sequence Alignment

Handling Updates of a Pairwise Sequence Alignment

Sequence alignment or decoding in molecular biology is mostly done via computationally expensive dynamic programming (DP) based approaches. Unfortunately, as sequencing errors are discovered frequently, researchers must repeat all previous similarity analysis for the erroneous sequence. This can take hours or days. In this work, we derive relative tolerance bounds on node distances from a root node that guarantee that partial shortest path distances remain optimal. We then propose an algorithm that uses these bounds to skip all unperturbed parts of a sequence when recomputing an alignment. We also discuss techniques to reduce the memory requirements of the algorithm by focusing on the highly conserved segments of the sequence. Experimental results establish that our proposed alignment procedure can update alignment decisions of modified sequence with 4.6% to 18% of the number of computations required by the normal Needleman-Wunsch algorithm, depending on sequence length. Higher computational savings are achieved with longer sequences

Ahmed H. Tewfik | Changjin Hong

[1] Christus,et al. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[2] David Eppstein,et al. Finding the k Smallest Spanning Trees , 1990, BIT.

[3] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[4] Daniel S. Hirschberg,et al. Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[5] David Hung-Chang Du,et al. Handling updates of a biological sequence based on Hidden Markov Models , 2005, 2005 13th European Signal Processing Conference.

[6] Dan Gusfield,et al. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[8] Douglas R. Shier,et al. Arc tolerances in shortest path and network flow problems , 1980, Networks.

[9] A. Clark,et al. Sequencing errors and molecular evolutionary analysis. , 1992, Molecular biology and evolution.

[10] Sean R. Eddy,et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .