Osiris: accessible and reproducible phylogenetic and phylogenomic analyses within the Galaxy workflow management system

BackgroundPhylogenetic tools and ‘tree-thinking’ approaches increasingly permeate all biological research. At the same time, phylogenetic data sets are expanding at breakneck pace, facilitated by increasingly economical sequencing technologies. Therefore, there is an urgent need for accessible, modular, and sharable tools for phylogenetic analysis.ResultsWe developed a suite of wrappers for new and existing phylogenetics tools for the Galaxy workflow management system that we call Osiris. Osiris and Galaxy provide a sharable, standardized, modular user interface, and the ability to easily create complex workflows using a graphical interface. Osiris enables all aspects of phylogenetic analysis within Galaxy, including de novo assembly of high throughput sequencing reads, ortholog identification, multiple sequence alignment, concatenation, phylogenetic tree estimation, and post-tree comparative analysis. The open source files are available on in the Bitbucket public repository and many of the tools are demonstrated on a public web server (http://galaxy-dev.cnsi.ucsb.edu/osiris/).ConclusionsOsiris can serve as a foundation for other phylogenomic and phylogenetic tool development within the Galaxy platform.

[1]  Kenneth S. Kosik,et al.  Reconstructing ancestral genome content based on symmetrical best alignments and Dollo parsimony , 2008, Bioinform..

[2]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[3]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[4]  Todd H. Oakley,et al.  Phylotranscriptomics to bring the understudied into the fold: monophyletic ostracoda, fossil placement, and pancrustacean phylogeny. , 2013, Molecular biology and evolution.

[5]  Christian M. Zmasek,et al.  phyloXML: XML for evolutionary biology and comparative genomics , 2009, BMC Bioinformatics.

[6]  Ramón Doallo,et al.  ProtTest 3: fast selection of best-fit models of protein evolution , 2011, Bioinform..

[7]  D. Faith Conservation evaluation and phylogenetic diversity , 1992 .

[8]  Ingo Ebersberger,et al.  HaMStR: Profile hidden markov model based search for orthologs in ESTs , 2009, BMC Evolutionary Biology.

[9]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[10]  D. Maddison,et al.  Interactive analysis of phylogeny and character evolution using the computer program MacClade. , 1989, Folia primatologica; international journal of primatology.

[11]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[12]  Liang Liu,et al.  Estimating species trees from unrooted gene trees. , 2011, Systematic biology.

[13]  Vladimir Makarenkov,et al.  Armadillo 1.1: An Original Workflow Platform for Designing and Conducting Phylogenetic Analysis and Simulations , 2012, PloS one.

[14]  Campbell O. Webb,et al.  Phylomatic: tree assembly for applied phylogenetics , 2005 .

[15]  Alexander Isaev,et al.  PyEvolve: a toolkit for statistical modelling of molecular evolution , 2004, BMC Bioinformatics.

[16]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[17]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[18]  Peer Bork,et al.  Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation , 2007, Bioinform..

[19]  Moustafa Ghanem,et al.  Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support , 2012, BMC Bioinformatics.

[20]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[21]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[22]  Anton Nekrutenko,et al.  Galaxy CloudMan: delivering cloud compute clusters , 2010, BMC Bioinformatics.

[23]  J. Foster,et al.  Relaxed Neighbor Joining: A Fast Distance-Based Phylogenetic Tree Construction Method , 2006, Journal of Molecular Evolution.

[24]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[25]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[26]  Gerard Talavera,et al.  Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. , 2007, Systematic biology.

[27]  Ramón Doallo,et al.  ProtTest-HPC: Fast Selection of Best-Fit Models of Protein Evolution , 2010, Euro-Par Workshops.

[28]  K. Meusemann,et al.  FASconCAT: Convenient handling of data matrices. , 2010, Molecular phylogenetics and evolution.

[29]  Denis Krompass,et al.  Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood , 2011, Systematic biology.

[30]  J. Boore,et al.  Hexapod Origins: Monophyletic or Paraphyletic? , 2003, Science.

[31]  Laura Salter Kubatko,et al.  STEM: species tree estimation using maximum likelihood for gene trees under coalescence , 2009, Bioinform..

[32]  D. Posada jModelTest: phylogenetic model averaging. , 2008, Molecular biology and evolution.

[33]  Ari Löytynoja,et al.  A model of evolution and structure for multiple sequence alignment , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[34]  Anton Nekrutenko,et al.  Harnessing cloud computing with Galaxy Cloud , 2011, Nature Biotechnology.

[35]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[36]  Katharina Misof,et al.  A Monte Carlo approach successfully identifies randomness in multiple sequence alignments: a more objective means of data exclusion. , 2009, Systematic biology.

[37]  Casey W. Dunn,et al.  Phyutility: a phyloinformatics tool for trees, alignments and molecular data , 2008, Bioinform..

[38]  D. Pearl,et al.  Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. , 2007, Systematic biology.

[39]  Chris Sander,et al.  MView: a web-compatible database search or multiple alignment viewer , 1998, Bioinform..

[40]  David Posada,et al.  ProtTest: selection of best-fit models of protein evolution , 2005, Bioinform..

[41]  Scott V Edwards,et al.  Coalescent methods for estimating phylogenetic trees. , 2009, Molecular phylogenetics and evolution.

[42]  Hilmar Lapp,et al.  NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata , 2012, Systematic biology.

[43]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[44]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[45]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[46]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..