Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures

T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.

[1]  Erik L. L. Sonnhammer,et al.  Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment , 2006, Nucleic Acids Res..

[2]  Alfonso Valencia,et al.  Structure-based prediction of the Saccharomyces cerevisiae SH3-ligand interactions. , 2009, Journal of molecular biology.

[3]  David S. Goodsell,et al.  The RCSB Protein Data Bank: redesigned web site and web services , 2010, Nucleic Acids Res..

[4]  Erik L. L. Sonnhammer,et al.  Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER , 2005, BMC Bioinformatics.

[5]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[6]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[7]  C. Notredame,et al.  Using multiple alignment methods to assess the quality of genomic data analysis. , 2003 .

[8]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[10]  Iain M. Wallace,et al.  M-Coffee: combining multiple sequence alignment methods with T-Coffee , 2006, Nucleic acids research.

[11]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[12]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[13]  Cédric Notredame,et al.  Upcoming challenges for multiple sequence alignment methods in the high-throughput era , 2009, Bioinform..

[14]  Michael D. Wilson,et al.  Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding , 2010, Science.

[15]  D. Higgins,et al.  R-Coffee: a method for multiple alignment of non-coding RNA , 2008, Nucleic acids research.

[16]  Robert D. Finn,et al.  Rfam: updates to the RNA families database , 2008, Nucleic Acids Res..

[17]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[18]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[19]  Olivier Poch,et al.  BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark , 2005, Proteins.

[20]  M. Suchard,et al.  Alignment Uncertainty and Genomic Analysis , 2008, Science.

[21]  Cédric Notredame,et al.  3DCoffee: combining protein sequences and structures within multiple sequence alignments. , 2004, Journal of molecular biology.

[22]  Olivier Poch,et al.  A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives , 2011, PloS one.

[23]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[24]  N. Grishin,et al.  PROMALS3D: a tool for multiple protein sequence and structure alignments , 2008, Nucleic acids research.

[25]  D Eisenberg,et al.  Profile analysis. , 1990, Methods in enzymology.

[26]  Fernando Guirado,et al.  Exploiting parallelism on progressive alignment methods , 2011, The Journal of Supercomputing.

[27]  Enrique Blanco,et al.  Transcription Factor Map Alignment of Promoter Regions , 2006, PLoS Comput. Biol..

[28]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[29]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[30]  François Stricher,et al.  T-RMSD: a fine-grained, structure-based classification method and its application to the functional characterization of TNF receptors. , 2010, Journal of molecular biology.

[31]  Noha El-Kafrawy,et al.  Analysis and classification of sleep EEG , 2012 .

[32]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[33]  William H. Piel,et al.  PhyloWidget: web-based visualizations for the tree of life , 2008, Bioinform..

[34]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[35]  Robert C. Edgar,et al.  Multiple sequence alignment. , 2006, Current opinion in structural biology.

[36]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[37]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[38]  Sebastian Will,et al.  RNAalifold: improved consensus structure prediction for RNA alignments , 2008, BMC Bioinformatics.

[39]  Kiyoshi Asai,et al.  Rfold: an exact algorithm for computing local base pairing probabilities , 2008, Bioinform..

[40]  Fabrice Armougom,et al.  Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee , 2006, Nucleic Acids Res..

[41]  Winfried Just,et al.  Computational Complexity of Multiple Sequence Alignment with SP-Score , 2001, J. Comput. Biol..

[42]  Fernando Guirado,et al.  Cloud-Coffee: implementation of a parallel consistency-based multiple alignment algorithm in the T-Coffee package and its benchmarking on the Amazon Elastic-Cloud , 2010, Bioinform..

[43]  H. Wolfson,et al.  Analysis and classification of RNA tertiary structures. , 2008, RNA.

[44]  M. Gribskov,et al.  [9] Profile analysis , 1990 .