GAIA: an integrated metagenomics suite

Identifying the biological diversity of a microbial population is of fundamental importance due to its implications in industrial processes, environmental studies and clinical applications. Today, there is still an outstanding need to develop new, easy-to-use bioinformatics tools to analyze both amplicon and shotgun metagenomics, including both prokaryotic and eukaryotic organisms, with the highest accuracy and the lowest running time. With the aim of overcoming this need, we introduce GAIA, an online software solution that has been designed to provide users with the maximum information whether it be 16S, 18S, ITS, or shotgun analysis. GAIA is able to obtain a comprehensive and detailed overview at any taxonomic level of microbiomes of different origins: human (e.g. stomach or skin), agricultural and environmental (e.g. land, water or organic waste). By using recently published benchmark datasets from shotgun and 16S experiments we compared GAIA against several available pipelines. Our results show that for shotgun metagenomics, GAIA obtained the highest F-measures at species level above all tested pipelines (CLARK, Kraken, LMAT, BlastMegan, DiamondMegan and NBC). For 16S metagenomics, GAIA also obtained excellent F-measures comparable to QIIME at family level. The overall objective of GAIA is to provide both the academic and industrial sectors with an integrated metagenomics suite that will allow to perform metagenomics data analysis easily, quickly and affordably with the highest accuracy.

[1]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[2]  Gail L. Rosen,et al.  NBC: the Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads , 2010, Bioinform..

[3]  S. Lynch,et al.  The Human Intestinal Microbiome in Health and Disease. , 2016, The New England journal of medicine.

[4]  R. Franklin,et al.  MinION TM nanopore sequencing of environmental metagenomes: a synthetic approach , 2017 .

[5]  Ryan J. Newton,et al.  The microbiome of urban waters. , 2015, International microbiology : the official journal of the Spanish Society for Microbiology.

[6]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[7]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[8]  Maya Gokhale,et al.  Scalable metagenomic taxonomy classification using a reference genome database , 2013, Bioinform..

[9]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[10]  S. Lonardi,et al.  CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers , 2015, BMC Genomics.

[11]  Ahmed A. Metwally,et al.  Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing. , 2016, Biochemical and biophysical research communications.

[12]  S. Oliver,et al.  Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes , 2017, GigaScience.

[13]  J. Venter,et al.  The Human Microbiome and Cancer , 2017, Cancer Prevention Research.

[14]  Jo Handelsman,et al.  Metagenomics for studying unculturable microorganisms: cutting the Gordian knot , 2005, Genome Biology.

[15]  P. Hirsch,et al.  Data analysis for 16S microbial profiling from different benchtop sequencing platforms. , 2014, Journal of microbiological methods.

[16]  Stefano Lonardi,et al.  Comprehensive benchmarking and ensemble approaches for metagenomic classifiers , 2017, Genome Biology.

[17]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[18]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[19]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[20]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[21]  Hélène Touzet,et al.  Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics , 2017, PloS one.

[22]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[23]  D. Raoult,et al.  Repertoire of human gut microbes. , 2017, Microbial pathogenesis.