MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization

Abstract This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.

[1]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[2]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[3]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[4]  Manuel Gil,et al.  Simple chained guide trees give poorer multiple sequence alignments than inferred trees in simulation and phylogenetic benchmarks , 2015, Proceedings of the National Academy of Sciences.

[5]  Fabian Sievers,et al.  Simple chained guide trees give high-quality protein multiple sequence alignments , 2014, Proceedings of the National Academy of Sciences.

[6]  David R. Nelson,et al.  Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment , 2014, BMC Bioinformatics.

[7]  Alinda Nagy,et al.  MisPred: a resource for identification of erroneous protein sequences in public databases , 2013, Database J. Biol. Databases Curation.

[8]  Liisa Holm,et al.  COFFEE: an objective function for multiple sequence alignments , 1998, Bioinform..

[9]  Albert J. Vilella,et al.  Accurate extension of multiple sequence alignments using a phylogeny-aware graph algorithm , 2012, Bioinform..

[10]  Tandy J. Warnow,et al.  FASTSP: linear time calculation of alignment accuracy , 2011, Bioinform..

[11]  Peter J. Munson,et al.  A novel randomized iterative strategy for aligning multiple protein sequences , 1991, Comput. Appl. Biosci..

[12]  Tandy J. Warnow,et al.  Ultra-large alignments using phylogeny-aware profiles , 2015, Genome Biology.

[13]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[14]  Desmond G. Higgins,et al.  Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments , 2015, Bioinform..

[15]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[16]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[17]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[18]  Kazutaka Katoh,et al.  PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences , 2007, Bioinform..

[19]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[20]  M. Sternberg,et al.  A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. , 1987, Journal of molecular biology.

[21]  Christian M. Zmasek,et al.  phyloXML: XML for evolutionary biology and comparative genomics , 2009, BMC Bioinformatics.

[22]  Alexandros Stamatakis,et al.  Aligning short reads to reference alignments and trees , 2011, Bioinform..

[23]  C. Dessimoz,et al.  Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web , 2016, Molecular biology and evolution.

[24]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[25]  Desmond G. Higgins,et al.  Sequence embedding for fast construction of guide trees for multiple sequence alignment , 2010, Algorithms for Molecular Biology.

[26]  Osamu Nishimura,et al.  aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity , 2013, Nucleic Acids Res..

[27]  Osamu Gotoh,et al.  Optimal alignment between groups of sequences and its application to multiple sequence alignment , 1993, Comput. Appl. Biosci..

[28]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[29]  Quan Le,et al.  Protein multiple sequence alignment benchmarking through secondary structure prediction , 2017, Bioinform..

[30]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[31]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[32]  Desmond G. Higgins,et al.  Systematic exploration of guide-tree topology effects for small protein alignments , 2014, BMC Bioinformatics.

[33]  P. Hogeweg,et al.  The alignment of sets of sequences and the construction of phyletic trees: An integrated method , 2005, Journal of Molecular Evolution.

[34]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[35]  Kazutaka Katoh,et al.  Adding unaligned sequences into an existing alignment using MAFFT and LAST , 2012, Bioinform..

[36]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[37]  Burkhard Rost,et al.  MSAViewer: interactive JavaScript visualization of multiple sequence alignments , 2016, Bioinform..

[38]  Tandy J. Warnow,et al.  PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences , 2015, J. Comput. Biol..

[39]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[40]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[41]  Rodrigo Gouveia-Oliveira,et al.  MaxAlign: maximizing usable data in an alignment , 2007, BMC Bioinformatics.

[42]  Kazutaka Katoh,et al.  Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees , 2016, Bioinform..

[43]  Desmond G. Higgins,et al.  Making automated multiple alignments of very large numbers of protein sequences , 2013, Bioinform..