RevTrans: multiple alignment of coding DNA from aligned amino acid sequences

The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit from the information that is implicit in empirical substitution matrices such as BLOSUM-62. Taken together with the generally higher rate of synonymous mutations over non-synonymous ones, this means that the phylogenetic signal disappears much more rapidly from DNA sequences than from the encoded proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans. RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA alignment by 'reverse translation' of the aligned protein sequences. In the resulting DNA alignment, gaps occur in groups of three corresponding to entire codons, and analogous codon positions are therefore always lined up. These features are useful when constructing multiple DNA alignments for phylogenetic analysis. RevTrans also accepts user-provided protein alignments for greater control of the alignment process. The RevTrans web server is freely available at http://www.cbs.dtu.dk/services/RevTrans/.

[1]  Jotun Hein,et al.  Genomic alignment , 2004, Journal of Molecular Evolution.

[2]  Burkhard Morgenstern,et al.  DIALIGN2: Improvement of the segment to segment approach to multiple sequence alignment , 1999, German Conference on Bioinformatics.

[3]  J Hein,et al.  An algorithm combining DNA and protein alignment. , 1994, Journal of theoretical biology.

[4]  Ziheng Yang,et al.  Statistical methods for detecting molecular adaptation , 2000, Trends in Ecology & Evolution.

[5]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[6]  R F Doolittle,et al.  Relationships of human protein sequences to those of other organisms. , 1986, Cold Spring Harbor symposia on quantitative biology.

[7]  J Hein,et al.  Combined DNA and protein alignment. , 1996, Methods in enzymology.

[8]  S. Altschul,et al.  Improved Sensitivity of Nucleic Acid Database Searches Using Application-Specific Scoring Matrices , 1991 .

[9]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[10]  W. Pearson Effective protein sequence comparison. , 1996, Methods in enzymology.

[11]  S. Henikoff,et al.  Amino acid substitution matrices. , 2000, Advances in protein chemistry.

[12]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Christian N. S. Pedersen,et al.  Comparison of Coding DNA , 1998, CPM.

[14]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.