Fast Approximation to the NP-hard Problem of Multiple Sequence Alignment

The study and comparison of several sequences of characters from a nite alphabet is relevant to various areas of science, in particular molecular biology. It has been shown that multiple sequence alignment with the sum-of-pairs score is NP-hard. Recently a fast heurstic method was proposed based on a Divide-and-Conquer technique. Recursively, all sequences were cut at some suitable positions. Eventually, the sets of subsequences were aligned optimally. In general, the (time) complexity of searching for good cutting points is O(l n) (n the number and l the maximal length of the sequences involved). By a simple (n l)-time technique, the base l was reduced, leading to a reasonable fast alignment algorithm for up to n = 7 and l 500. We reene the base-reducing technique by spending computational time quad-ratic in n (and still linear in l). This improves the alignment procedure regarding the number of sequences managable up to n = 9 (of same length l). Moreover, we present two natural extensions of this technique. One is an iterative application of a (n 2 l)-time technique and therefore still of that complexity. The other needs time O(n 2 l s+1), where s is the number of sequences simultaneously considered during a minimization procedure.

[1]  O. Gotoh Alignment of three biological sequences with an efficient traceback procedure. , 1986, Journal of theoretical biology.

[2]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[3]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[4]  K. Hatrick,et al.  Compensating changes in protein multiple sequence alignments. , 1994, Protein engineering.

[5]  Masato Ishikawa,et al.  MASCOT: multiple alignment system for protein sequences based on three- way dynamic programming , 1993, Comput. Appl. Biosci..

[6]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[7]  Hans-Jürgen Bandelt,et al.  Medians in median graphs , 1984, Discret. Appl. Math..

[8]  L Allison A fast algorithm for the optimal alignment of three strings. , 1993, Journal of theoretical biology.

[9]  Jens Stoye,et al.  Improving the Divide-and-Conquer Approach to Sum-of-Pairs Multiple Sequence Alignment , 1997 .

[10]  William R. Taylor,et al.  Motif-Biased Protein Sequence Alignment , 1994, J. Comput. Biol..

[11]  Dalit Naor,et al.  On Near-Optimal Alignments of Biological Sequences , 1994, J. Comput. Biol..

[12]  S. Altschul,et al.  A tool for multiple sequence alignment. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[14]  D. Lipman,et al.  Trees, stars, and multiple biological sequence alignment , 1989 .

[15]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[17]  A. K. Wong,et al.  A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[18]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[19]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[20]  Xiaoqiu Huang Alignment of three sequences in quadratic space , 1993, SIAP.

[21]  J. Richardson,et al.  Simultaneous comparison of three protein sequences. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[23]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[24]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[25]  P. Argos,et al.  Motif recognition and alignment for many sequences by comparison of dot-matrices. , 1991, Journal of molecular biology.

[26]  J Stoye,et al.  A general method for fast multiple sequence alignment. , 1996, Gene.

[27]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Hein Unified approach to alignment and phylogenies. , 1990, Methods in enzymology.

[29]  Andreas W. M. Dress,et al.  A Divide and Conquer Approach to Multiple Alignment , 1995, ISMB.

[30]  R. Doolittle Molecular evolution: computer analysis of protein and nucleic acid sequences. , 1990, Methods in enzymology.

[31]  MARTIN VINGRON,et al.  Towards Integration of Multiple Alignment and Phylogenetic Tree Construction , 1997, J. Comput. Biol..

[32]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.