Simultaneous phylogeny reconstruction and multiple sequence alignment

BackgroundA phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality.ResultsWe devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality.ConclusionWe present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments.

[1]  J Hein,et al.  A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. , 1989, Molecular biology and evolution.

[2]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Jan M. Strugnell,et al.  Molecular phylogeny of coleoid cephalopods (Mollusca: Cephalopoda) inferred from three mitochondrial and six nuclear loci: a comparison of alignment, implied alignment and analysis methods , 2007 .

[4]  Jijun Tang,et al.  A Divide-and-Conquer Implementation of Three Sequence Alignment and Ancestor Inference , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[5]  Satish Chikkagoudar,et al.  Improving progressive alignment for phylogeny reconstruction using parsimonious guide-trees , 2006, Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06).

[6]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[7]  David Sankoff,et al.  Multiple Genome Rearrangement and Breakpoint Phylogeny , 1998, J. Comput. Biol..

[8]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[9]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[10]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[13]  Daniel H. Huson,et al.  Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction , 1999, J. Comput. Biol..

[14]  M. Rosenberg,et al.  Alignment and topological accuracy of the direct optimization approach via POY and traditional phylogenetics via ClustalW + PAUP*. , 2007, Systematic biology.

[15]  Nick Goldman,et al.  Effects of sequence alignment procedures on estimates of phylogeny , 1998 .

[16]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[17]  MARTIN VINGRON,et al.  Towards Integration of Multiple Alignment and Phylogenetic Tree Construction , 1997, J. Comput. Biol..

[18]  M. Suchard,et al.  Alignment Uncertainty and Genomic Analysis , 2008, Science.

[19]  O. Gotoh Alignment of three biological sequences with an efficient traceback procedure. , 1986, Journal of theoretical biology.

[20]  David A. Bader,et al.  A New Implmentation and Detailed Study of Breakpoint Analysis , 2000, Pacific Symposium on Biocomputing.

[21]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[22]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[23]  Olivier Poch,et al.  BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs , 1999, Bioinform..

[24]  David Sankoff,et al.  Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA , 1976, Journal of Molecular Evolution.

[25]  L. Allison,et al.  Fast, optimal alignment of three sequences using linear gap costs. , 2000, Journal of theoretical biology.

[26]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[27]  R. Ravi,et al.  GESTALT: Genomic Steiner Alignments , 1999, CPM.

[28]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .