ASaiM: a Galaxy-based framework to analyze microbiota data

Abstract Background New generations of sequencing platforms coupled to numerous bioinformatics tools have led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore, and visualize microbiota information from raw metataxonomic, metagenomic, or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Conclusions Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation, and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible, and shareable.

[1]  Sergei L. Kosakovsky Pond,et al.  Windshield splatter analysis with the Galaxy metagenomic pipeline. , 2009, Genome research.

[2]  Anton Nekrutenko,et al.  NGS analyses by visualization with Trackster , 2012, Nature Biotechnology.

[3]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[4]  C. Brown,et al.  Evaluating Metagenome Assembly on a Simple Defined Community with Many Strain Variants , 2017, bioRxiv.

[5]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[6]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[7]  Philip D. Blood,et al.  Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software , 2017, Nature Methods.

[8]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[9]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[10]  Alexandre P. Francisco,et al.  PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods , 2017, Bioinform..

[11]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[12]  Jesse R. Zaneveld,et al.  Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences , 2013, Nature Biotechnology.

[13]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[14]  Ben Nichols,et al.  Distributed under Creative Commons Cc-by 4.0 Vsearch: a Versatile Open Source Tool for Metagenomics , 2022 .

[15]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[16]  John Vollmers,et al.  Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters! , 2017, PloS one.

[17]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[18]  S. Eddy,et al.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions , 2013, Nucleic acids research.

[19]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[20]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[21]  Aristotelis A. Chatziioannou,et al.  Integrative workflows for metagenomic analysis , 2014, Front. Cell Dev. Biol..

[22]  Björn A. Grüning,et al.  ENASearch: A Python library for interacting with ENA's API , 2017, J. Open Source Softw..

[23]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[24]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[25]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[26]  Shibu Yooseph,et al.  Utilization of defined microbial communities enables effective evaluation of meta-genomic assemblies , 2017, BMC Genomics.

[27]  Luis Miguel Rodriguez-Rojas,et al.  Nonpareil: a redundancy-based approach to assess the level of coverage in metagenomic datasets , 2014, Bioinform..

[28]  Anthony Bretaudeau,et al.  Community-driven data analysis training for biology , 2017, bioRxiv.

[29]  Enis Afgan,et al.  BioBlend: automating pipeline analyses within Galaxy and CloudMan , 2013, Bioinform..

[30]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[31]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[32]  James E. Johnson,et al.  NCBI BLAST+ integrated into Galaxy , 2015, bioRxiv.

[33]  Timothy L. Tickle,et al.  Compact graphical representation of phylogenetic data and metadata with GraPhlAn , 2015, PeerJ.

[34]  Anton Nekrutenko,et al.  Dissemination of scientific software with Galaxy ToolShed , 2014, Genome Biology.

[35]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[36]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[37]  Matthew Fraser,et al.  EBI metagenomics—a new resource for the analysis and archiving of metagenomic data , 2013, Nucleic Acids Res..

[38]  Yuri Pirola,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2017, Nature Methods.

[39]  Alexey A. Gurevich,et al.  MetaQUAST: evaluation of metagenome assemblies , 2016, Bioinform..

[40]  Bernard Henrissat,et al.  Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome , 2012, PLoS Comput. Biol..

[41]  Andreas Wilke,et al.  The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome , 2012, GigaScience.

[42]  Anthony Bretaudeau,et al.  Community-driven data analysis training for biology , 2017, bioRxiv.

[43]  H. Bik Phinch: An interactive, exploratory data visualization framework for –Omic datasets , 2014, bioRxiv.

[44]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[45]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[46]  Duy Tin Truong,et al.  MetaPhlAn2 for enhanced metagenomic taxonomic profiling , 2015, Nature Methods.

[47]  O. Reva,et al.  Assembling metagenomes, one community at a time , 2017, BMC Genomics.

[48]  Hélène Touzet,et al.  SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data , 2012, Bioinform..

[49]  N. Segata,et al.  Shotgun metagenomics, from sampling to analysis , 2017, Nature Biotechnology.

[50]  Olaf Wolkenhauer,et al.  The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy , 2017, Nucleic Acids Res..

[51]  Sergey Koren,et al.  Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes , 2019, Briefings Bioinform..

[52]  Hing-Fung Ting,et al.  MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. , 2016, Methods.

[53]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[54]  Heng Li,et al.  Improving SNP discovery by base alignment quality , 2011, Bioinform..

[55]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.