Externalizing the Multiple Sequence Alignment Problem with Affine Gap Costs

Multiple sequence alignment (MSA) is a problem in computational biology with the goal to discover similarities between DNA or protein sequences. One problem in larger instances is that the search exhausts main memory. This paper applies disk-based heuristic search to solve MSA benchmarks. We extend iterative-deepening dynamic programming, a hybrid of dynamic programming and IDA*, for which optimal alignments with respect to similarity metrics and affine gap cost are computed. We achieve considerable savings of main memory with an acceptable time overhead. By scaling buffer sizes, the space-time trade-off can be adapted to existing resources.

[1]  Eric A. Hansen,et al.  Breadth-first heuristic search , 2004, Artif. Intell..

[2]  Eric A. Hansen,et al.  K-group A* for multiple sequence alignment with quasi-natural gap costs , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[3]  S. Altschul Gap costs for multiple sequence alignment. , 1989, Journal of theoretical biology.

[4]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[5]  S. Schroedl An Improved Search Algorithm for Optimal Multiple-Sequence Alignment , 2005, J. Artif. Intell. Res..

[6]  Benjamin W. Wah,et al.  Comparison and Evaluation of a Class of IDA* Algorithms , 1994, Int. J. Artif. Intell. Tools.

[7]  José Nelson Amaral,et al.  Sequential and Parallel Algorithms for Frontier A* with Delayed Duplicate Detection , 2006, AAAI.

[8]  Eric A. Hansen,et al.  Graph Embedding with Constraints , 2009, IJCAI.

[9]  Richard E. Korf,et al.  Frontier search , 2005, JACM.

[10]  Eric A. Hansen,et al.  Sweep A: space-efficient heuristic search in partially ordered graphs , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.