To assemble or not to resemble—A validated Comparative Metatranscriptomics Workflow (CoMW)

Abstract Background Metatranscriptomics has been used widely for investigation and quantification of microbial communities’ activity in response to external stimuli. By assessing the genes expressed, metatranscriptomics provides an understanding of the interactions between different major functional guilds and the environment. Here, we present a de novo assembly-based Comparative Metatranscriptomics Workflow (CoMW) implemented in a modular, reproducible structure. Metatranscriptomics typically uses short sequence reads, which can either be directly aligned to external reference databases (“assembly-free approach”) or first assembled into contigs before alignment (“assembly-based approach”). We also compare CoMW (assembly-based implementation) with an assembly-free alternative workflow, using simulated and real-world metatranscriptomes from Arctic and temperate terrestrial environments. We evaluate their accuracy in precision and recall using generic and specialized hierarchical protein databases. Results CoMW provided significantly fewer false-positive results, resulting in more precise identification and quantification of functional genes in metatranscriptomes. Using the comprehensive database M5nr, the assembly-based approach identified genes with only 0.6% false-positive results at thresholds ranging from inclusive to stringent compared with the assembly-free approach, which yielded up to 15% false-positive results. Using specialized databases (carbohydrate-active enzyme and nitrogen cycle), the assembly-based approach identified and quantified genes with 3–5 times fewer false-positive results. We also evaluated the impact of both approaches on real-world datasets. Conclusions We present an open source de novo assembly-based CoMW. Our benchmarking findings support assembling short reads into contigs before alignment to a reference database because this provides higher precision and minimizes false-positive results.

[1]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[2]  Lei Cheng,et al.  NCycDB: a curated integrative database for fast and accurate metagenomic profiling of nitrogen cycling genes , 2018, Bioinform..

[3]  C. Huttenhower,et al.  Metagenomic biomarker discovery and explanation , 2011, Genome Biology.

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  R. Harris,et al.  Taxonomic and Functional Compositions Impacted by the Quality of Metatranscriptomic Assemblies , 2018, Front. Microbiol..

[6]  N. Brereton,et al.  Trees, fungi and bacteria: tripartite metatranscriptomics of a root microbiome responding to soil contamination , 2018, Microbiome.

[7]  Alyssa C. Frazee,et al.  Polyester: Simulating RNA-Seq Datasets With Differential Transcript Expression , 2014, bioRxiv.

[8]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[9]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[10]  Muhammad Zohaib Anwar,et al.  Total RNA sequencing reveals multilevel microbial community changes and functional responses to wood ash application in agricultural and forest soil , 2019, bioRxiv.

[11]  Jacqueline A. Servin,et al.  Decoding the genomic tree of life , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Miguel Pignatelli,et al.  Metatranscriptomic Approach to Analyze the Functional Human Gut Microbiota , 2011, PloS one.

[13]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[14]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[15]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[16]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[17]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[18]  Andreas Wilke,et al.  The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools , 2012, BMC Bioinformatics.

[19]  G. Narasimhan,et al.  Metagenomics, Metatranscriptomics, and Metabolomics Approaches for Microbiome Analysis , 2016, Evolutionary bioinformatics online.

[20]  Mile Sikic,et al.  SWORD - a highly efficient protein database search , 2015, bioRxiv.

[21]  Yoonsoo Hahn,et al.  Metatranscriptomic analysis of lactic acid bacterial gene expression during kimchi fermentation. , 2013, International journal of food microbiology.

[22]  Yongan Zhao,et al.  RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data , 2011, Bioinform..

[23]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[24]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[25]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[26]  Daniel H. Huson,et al.  Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome , 2008, PloS one.

[27]  T. Vogel,et al.  Transcriptomic responses to warming and cooling of an Arctic tundra soil microbiome , 2019 .

[28]  Marie-Agnès Dillies,et al.  SARTools: A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis of RNA-Seq Data , 2015, bioRxiv.

[29]  Yasubumi Sakakibara,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2012, Nucleic acids research.

[30]  Philip D. Blood,et al.  Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software , 2017, Nature Methods.

[31]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[32]  G. Panagiotou,et al.  COMAN: a web server for comprehensive metatranscriptomics analysis , 2016, BMC Genomics.

[33]  E. Rimm,et al.  Metatranscriptome of human fecal microbial communities in a cohort of adult men , 2018, Nature Microbiology.

[34]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[35]  Fernando Azpiroz,et al.  MetaTrans: an open-source pipeline for metatranscriptomics , 2016, Scientific Reports.

[36]  Jos Boekhorst,et al.  A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets , 2013, BMC Genomics.

[37]  A. Heintz‐Buschart,et al.  IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses , 2016, Genome Biology.

[38]  Xuan Li,et al.  Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study , 2011, BMC Bioinformatics.

[39]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[40]  John Parkinson,et al.  Comparison of assembly algorithms for improving rate of metatranscriptomic functional annotation , 2014, Microbiome.

[41]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[42]  A. Spang,et al.  Methylotrophic methanogenic Thermoplasmata implicated in reduced methane emissions from bovine rumen , 2013, Nature Communications.

[43]  Michelle L. Treiber,et al.  SAMSA2: a standalone metatranscriptome analysis pipeline , 2017, BMC Bioinformatics.

[44]  Aleksandra Tarkowska,et al.  Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments , 2018, GigaScience.

[45]  W. Orsi,et al.  The transcriptional response of microbial communities in thawing Alaskan permafrost soils , 2015, Front. Microbiol..