ITS2, 18S, 16S or any other RNA - simply aligning sequences and their individual secondary structures simultaneously by an automatic approach.

Secondary structures of RNA sequences are increasingly being used as additional information in reconstructing phylogenies and/or in distinguishing species by compensatory base change (CBC) analyses. However, in most cases just one secondary structure is used in manually correcting an automatically generated multiple sequence alignment and/or just one secondary structure is used in guiding a sequence alignment still completely generated by hand. With the advent of databases and tools offering individual RNA secondary structures, here we re-introduce a twelve letter code already implemented in 4SALE - a tool for synchronous sequence and secondary structure alignment and editing - that enables one to align RNA sequences and their individual secondary structures synchronously and fully automatic, while dramatically increasing the phylogenetic information content. We further introduce a scaled down non-GUI version of 4SALE particularly designed for big data analysis, and available at: http://4sale.bioapps.biozentrum.uni-wuerzburg.de.

[1]  Jan Gorodkin,et al.  Multiple structural alignment and clustering of RNA sequences , 2007, Bioinform..

[2]  M. Hasegawa,et al.  Model of amino acid substitution in proteins encoded by mitochondrial DNA , 1996, Journal of Molecular Evolution.

[3]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[4]  Tobias Müller,et al.  The internal transcribed spacer 2 database—a web server for (not only) low level phylogenetic analyses , 2006, Nucleic Acids Res..

[5]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[6]  Thomas Dandekar,et al.  Synchronous visual analysis and editing of RNA sequence and secondary structure alignments using 4SALE , 2008, BMC Research Notes.

[7]  Bruno Torrésani,et al.  Rate Matrices for Analyzing Large Families of Protein Sequences , 2002, J. Comput. Biol..

[8]  Thomas Dandekar,et al.  Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures. , 2005, RNA.

[9]  P. Stadler,et al.  LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. , 2012, RNA.

[10]  Knut Reinert,et al.  Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization , 2007, BMC Bioinformatics.

[11]  Thomas W H Lui,et al.  Empirical models for substitution in ribosomal RNA. , 2003, Molecular biology and evolution.

[12]  Sean R. Eddy,et al.  Rfam 11.0: 10 years of RNA families , 2012, Nucleic Acids Res..

[13]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[14]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[15]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[16]  Tobias Müller,et al.  ProfDist: a tool for the construction of large phylogenetic trees based on profile distances , 2005, Bioinform..

[17]  R. Spang,et al.  Estimating amino acid substitution models: a comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method. , 2002, Molecular biology and evolution.

[18]  Tobias Müller,et al.  A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the Eukaryota. , 2005, RNA.

[19]  B. Michot,et al.  Ribosomal internal transcribed spacer 2 (ITS2) exhibits a common core of secondary structure in vertebrates and yeast. , 1999, Nucleic acids research.

[20]  Tobias Müller,et al.  ProfDistS: (profile-) distance based phylogeny on sequence - structure alignments , 2008, Bioinform..

[21]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[22]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[23]  Robert Giegerich,et al.  Pure multiple RNA secondary structure alignments: a progressive profile approach , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[25]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[26]  J. Schultz,et al.  ITS2 sequence-structure analysis in phylogenetics: a how-to manual for molecular systematics. , 2009, Molecular phylogenetics and evolution.

[27]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[28]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Rolf Backofen,et al.  Backofen R: MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons , 2005 .

[30]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[31]  J. Schultz,et al.  The ITS2 Database , 2012, Journal of visualized experiments : JoVE.

[32]  Jorja G. Henikoff,et al.  PHAT: a transmembrane-specific substitution matrix , 2000, Bioinform..

[33]  J G Henikoff,et al.  PHAT: a transmembrane-specific substitution matrix. Predicted hydrophobic and transmembrane. , 2000, Bioinformatics.

[34]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[35]  Tobias Müller,et al.  4SALE – A tool for synchronous RNA sequence and secondary structure alignment and editing , 2006, BMC Bioinformatics.

[36]  Thomas Hackl,et al.  ITS2 database IV: interactive taxon sampling for internal transcribed spacer 2 based phylogenies. , 2012, Molecular phylogenetics and evolution.

[37]  Takashi Gojobori,et al.  Estimation of average number of nucleotide substitutions when the rate of substitution varies with nucleotide , 2005, Journal of Molecular Evolution.

[38]  Jan Gorodkin,et al.  RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods , 2014, Methods in Molecular Biology.

[39]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[40]  Thomas Dandekar,et al.  Distinguishing species. , 2007, RNA.

[41]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[42]  Lars Arvestad,et al.  Efficient Methods for Estimating Amino Acid Replacement Rates , 2006, Journal of Molecular Evolution.

[43]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[44]  A. von Haeseler,et al.  A stochastic model for the evolution of autocorrelated DNA sequences. , 1994, Molecular phylogenetics and evolution.

[45]  Martin Vingron,et al.  Modeling Amino Acid Replacement , 2000, J. Comput. Biol..

[46]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[47]  Matthias Zytnicki,et al.  BlastR—fast and accurate database searches for non-coding RNAs , 2011, Nucleic acids research.

[48]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[49]  A. Coleman,et al.  The Internal Transcribed Spacer 2 Exhibits a Common Secondary Structure in Green Algae and Flowering Plants , 1997, Journal of Molecular Evolution.

[50]  Frank Förster,et al.  Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees , 2010, Biology Direct.