TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations

We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

[1]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[2]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[3]  Fabrice Armougom,et al.  PROTOGENE: turning amino acid alignments into bona fide CDS nucleotide alignments , 2006, Nucleic Acids Res..

[4]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[5]  Mark P. Simmons,et al.  Relative character-state space, amount of potential phylogenetic information, and heterogeneity of nucleotide and amino acid characters. , 2004, Molecular phylogenetics and evolution.

[6]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[7]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[8]  Mark P. Simmons,et al.  Amino acid vs. nucleotide characters: challenging preconceived notions. , 2002, Molecular phylogenetics and evolution.

[9]  I. Bregovec,et al.  A Guide to IUPAC Nomenclature of Organic Compounds , 2002 .

[10]  Gerard Talavera,et al.  Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. , 2007, Systematic biology.

[11]  R. Knight,et al.  Parallel Evolution of the Genetic Code in Arthropod Mitochondrial Genomes , 2006, PLoS biology.

[12]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[13]  Peer Bork,et al.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments , 2006, Nucleic Acids Res..

[14]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[15]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[16]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[17]  Geoffrey J. Barton,et al.  The Jalview Java alignment editor , 2004, Bioinform..

[18]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Anders Gorm Pedersen,et al.  RevTrans: multiple alignment of coding DNA from aligned amino acid sequences , 2003, Nucleic Acids Res..

[20]  Olaf R. P. Bininda-Emonds,et al.  transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences , 2005, BMC Bioinformatics.

[21]  R. Friedman,et al.  The Phylogenetic Informativeness of Nucleotide and Amino Acid Sequences for Reconstructing the Vertebrate Tree , 2008, Journal of Molecular Evolution.