Iterative pass optimization of sequence data.

The problem of determining the minimum-cost hypothetical ancestral sequences for a given cladogram is known to be NP-complete. This "tree alignment" problem has motivated the considerable effort placed in multiple sequence alignment procedures. Wheeler in 1996 proposed a heuristic method, direct optimization, to calculate cladogram costs without the intervention of multiple sequence alignment. This method, though more efficient in time and more effective in cladogram length than many alignment-based procedures, greedily optimizes nodes based on descendent information only. In their proposal of an exact multiple alignment solution, Sankoff et al. in 1976 described a heuristic procedure--the iterative improvement method--to create alignments at internal nodes by solving a series of median problems. The combination of a three-sequence direct optimization with iterative improvement and a branch-length-based cladogram cost procedure, provides an algorithm that frequently results in superior (i.e., lower) cladogram costs. This iterative pass optimization is both computation and memory intensive, but economies can be made to reduce this burden. An example in arthropod systematics is discussed.

[1]  Gonzalo Giribet,et al.  Arthropod phylogeny based on eight molecular loci and morphology , 2001, Nature.

[2]  W. Wheeler OPTIMIZATION ALIGNMENT: THE END OF MULTIPLE SEQUENCE ALIGNMENT IN PHYLOGENETICS? , 1996 .

[3]  D. Sankoff,et al.  Locating the vertices of a Steiner tree in arbitrary space , 1975 .

[4]  David Sankoff,et al.  The Median Problem for Breakpoints in Comparative Genomics , 1997, COCOON.

[5]  David S. Gladstein,et al.  Efficient Incremental Character Optimization , 1997, Cladistics : the international journal of the Willi Hennig Society.

[6]  D Sankoff,et al.  A test for nucleotide sequence homology. , 1973, Journal of molecular biology.

[7]  W. Wheeler Implied alignment: a synapomorphy‐based multiple‐sequence alignment method and its use in cladogram search , 2003, Cladistics : the international journal of the Willi Hennig Society.

[8]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[9]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[10]  Ward C. Wheeler,et al.  Optimization Alignment:Down,Up,Error,and Improvements , 2002 .

[11]  David Sankoff,et al.  Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA , 1976, Journal of Molecular Evolution.

[12]  Scott E. Hudson,et al.  Incremental attribute evaluation: a flexible algorithm for lazy update , 1991, TOPL.

[13]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[14]  Gonzalo Giribet,et al.  DNA multiple sequence alignments. , 2002, EXS.