AutoMLST: an automated web server for generating multi-locus species trees highlighting natural product potential

Abstract Understanding the evolutionary background of a bacterial isolate has applications for a wide range of research. However generating an accurate species phylogeny remains challenging. Reliance on 16S rDNA for species identification currently remains popular. Unfortunately, this widespread method suffers from low resolution at the species level due to high sequence conservation. Currently, there is now a wealth of genomic data that can be used to yield more accurate species designations via modern phylogenetic methods and multiple genetic loci. However, these often require extensive expertise and time. The Automated Multi-Locus Species Tree (autoMLST) was thus developed to provide a rapid ‘one-click’ pipeline to simplify this workflow at: https://automlst.ziemertlab.com. This server utilizes Multi-Locus Sequence Analysis (MLSA) to produce high-resolution species trees; this does not preform multi-locus sequence typing (MLST), a related classification method. The resulting phylogenetic tree also includes helpful annotations, such as species clade designations and secondary metabolite counts to aid natural product prospecting. Distinct from currently available web-interfaces, autoMLST can automate selection of reference genomes and out-group organisms based on one or more query genomes. This enables a wide range of researchers to perform rigorous phylogenetic analyses more rapidly compared to manual MLSA workflows.

[1]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[2]  Thomas K. F. Wong,et al.  ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates , 2017, Nature Methods.

[3]  Peer Bork,et al.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments , 2006, Nucleic Acids Res..

[4]  Sahar Abubucker,et al.  A computational framework for systematic exploration of biosynthetic diversity from large-scale genomic data , 2018, bioRxiv.

[5]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[6]  Kai Blin,et al.  antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification , 2017, Nucleic Acids Res..

[7]  Brian D. Ondov,et al.  Mash: fast genome and metagenome distance estimation using MinHash , 2015, Genome Biology.

[8]  Donovan H. Parks,et al.  A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life , 2018, Nature Biotechnology.

[9]  Renzo Kottmann,et al.  A standard operating procedure for phylogenetic inference (SOPPI) using (rRNA) marker genes. , 2008, Systematic and applied microbiology.

[10]  Gloria M. Coruzzi,et al.  The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data , 2009, PloS one.

[11]  D. Huson,et al.  Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks. , 2012, Systematic biology.

[12]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[13]  G. Garrity A New Genomics-Driven Taxonomy of Bacteria and Archaea: Are We There Yet? , 2016, Journal of Clinical Microbiology.

[14]  B. Rannala,et al.  Molecular phylogenetics: principles and practice , 2012, Nature Reviews Genetics.

[15]  Hong Luo,et al.  CVTree: a phylogenetic tree reconstruction tool based on whole genomes , 2004, Nucleic Acids Res..

[16]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[17]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[18]  Melissa J. Landrum,et al.  RefSeq: an update on mammalian reference sequences , 2013, Nucleic Acids Res..

[19]  D. Buckley,et al.  Widespread homologous recombination within and between Streptomyces species , 2010, The ISME Journal.

[20]  Mark P. Simmons,et al.  Coalescence vs. concatenation: Sophisticated analyses vs. first principles applied to rooting the angiosperms. , 2015, Molecular phylogenetics and evolution.

[21]  Stijn van Dongen,et al.  Using MCL to extract clusters from networks. , 2012, Methods in molecular biology.

[22]  Pelin Yilmaz,et al.  Phylogeny-aware identification and correction of taxonomically mislabeled sequences , 2016, bioRxiv.

[23]  Jack Sullivan,et al.  Does choice in model selection affect maximum likelihood analysis? , 2008, Systematic biology.

[24]  Jean-Michel Claverie,et al.  Phylogeny.fr: robust phylogenetic analysis for the non-specialist , 2008, Nucleic Acids Res..

[25]  Chao Zhang,et al.  ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees , 2018, BMC Bioinformatics.

[26]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[27]  Peter Kämpfer,et al.  Multilocus sequence analysis (MLSA) in prokaryotic taxonomy. , 2015, Systematic and applied microbiology.

[28]  Alexander Goesmann,et al.  EDGAR 2.0: an enhanced software platform for comparative gene content analyses , 2016, Nucleic Acids Res..

[29]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[30]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[31]  M. Doebeli,et al.  Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem , 2018, Microbiome.

[32]  A. von Haeseler,et al.  UFBoot2: Improving the Ultrafast Bootstrap Approximation , 2017, bioRxiv.

[33]  F. Witebsky,et al.  Analysis of Multiple Differing Copies of the 16S rRNA Gene in Five Clinical Isolates and Three Type Strains of Nocardia Species and Implications for Species Assignment , 2007, Journal of Clinical Microbiology.

[34]  Alexander F. Auch,et al.  Genome sequence-based species delimitation with confidence intervals and improved distance functions , 2013, BMC Bioinformatics.

[35]  Martin C. J. Maiden,et al.  BIGSdb: Scalable analysis of bacterial genome variation at the population level , 2010, BMC Bioinformatics.

[36]  William Fenical,et al.  Species-Specific Secondary Metabolite Production in Marine Actinomycetes of the Genus Salinispora , 2006, Applied and Environmental Microbiology.

[37]  O. Ovaskainen,et al.  Direct and indirect effects of a pH gradient bring insights into the mechanisms driving prokaryotic community structures , 2018, Microbiome.

[38]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[39]  P. Bork,et al.  ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data , 2016, Molecular biology and evolution.

[40]  Wei-Chun Chung,et al.  A gene profiling deconvolution approach to estimating immune cell composition from complex tissues , 2018, BMC Bioinformatics.

[41]  Guanghong Zuo,et al.  CVTree3 Web Server for Whole-genome-based and Alignment-free Prokaryotic Phylogeny and Taxonomy , 2015, Genom. Proteom. Bioinform..

[42]  Ying Huang,et al.  A multilocus phylogeny of the Streptomyces griseus 16S rRNA gene clade: use of multilocus sequence analysis for streptomycete systematics. , 2008, International journal of systematic and evolutionary microbiology.

[43]  R. Lanfear,et al.  The effects of partitioning on phylogenetic inference. , 2015, Molecular biology and evolution.

[44]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[45]  K. Konstantinidis,et al.  Genomic insights that advance the species definition for prokaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[46]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[47]  Mohammad Alanjary,et al.  Comparative genomics reveals phylogenetic distribution patterns of secondary metabolites in Amycolatopsis species , 2018, BMC Genomics.

[48]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[50]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[51]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[52]  Arndt von Haeseler,et al.  W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis , 2016, Nucleic Acids Res..

[53]  W. Dunne,et al.  The Infallible Microbial Identification Test: Does It Exist? , 2015, Journal of Clinical Microbiology.