Reconstructing ribosomal genes from large scale total RNA meta-transcriptomic data

Abstract Motivation Technological advances in meta-transcriptomics have enabled a deeper understanding of the structure and function of microbial communities. ‘Total RNA’ meta-transcriptomics, sequencing of total reverse transcribed RNA, provides a unique opportunity to investigate both the structure and function of active microbial communities from all three domains of life simultaneously. A major step of this approach is the reconstruction of full-length taxonomic marker genes such as the small subunit ribosomal RNA. However, current tools for this purpose are mainly targeted towards analysis of amplicon and metagenomic data and thus lack the ability to handle the massive and complex datasets typically resulting from total RNA experiments. Results In this work, we introduce MetaRib, a new tool for reconstructing ribosomal gene sequences from total RNA meta-transcriptomic data. MetaRib is based on the popular rRNA assembly program EMIRGE, together with several improvements. We address the challenge posed by large complex datasets by integrating sub-assembly, dereplication and mapping in an iterative approach, with additional post-processing steps. We applied the method to both simulated and real-world datasets. Our results show that MetaRib can deal with larger datasets and recover more rRNA genes, which achieve around 60 times speedup and higher F1 score compared to EMIRGE in simulated datasets. In the real-world dataset, it shows similar trends but recovers more contigs compared with a previous analysis based on random sub-sampling, while enabling the comparison of individual contig abundances across samples for the first time. Availability and implementation The source code of MetaRib is freely available at https://github.com/yxxue/MetaRib. Contact yaxin.xue@uib.no or Inge.Jonassen@uib.no Supplementary information Supplementary data are available at Bioinformatics online.

[1]  K. Turner,et al.  Metatranscriptomics of the Human Oral Microbiome during Health and Disease , 2014, mBio.

[2]  S. Campanaro,et al.  Direct 16S rRNA-seq from bacterial communities: a PCR-independent approach to simultaneously assess microbial diversity and functional activity potential of each taxon , 2016, Scientific Reports.

[3]  Tao Zhang,et al.  RNA Viral Community in Human Feces: Prevalence of Plant Pathogenic Viruses , 2005, PLoS biology.

[4]  S. Sales,et al.  All-Optical Fiber Hanbury Brown & Twiss Interferometer to study 1300 nm single photon emission of a metamorphic InAs Quantum Dot , 2016, Scientific Reports.

[5]  Ben Nichols,et al.  VSEARCH: a versatile open source tool for metagenomics , 2016, PeerJ.

[6]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[7]  C. Quince,et al.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. , 2013, Environmental microbiology.

[8]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[9]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[10]  M. Firestone,et al.  Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses , 2013, The ISME Journal.

[11]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[12]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[13]  C. Huttenhower,et al.  Sequencing and beyond: integrating molecular 'omics' for microbial community profiling , 2015, Nature Reviews Microbiology.

[14]  Ying Wang,et al.  Large‐scale 16S gene assembly using metagenomics shotgun sequences , 2017, Bioinform..

[15]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[16]  Kelly C. Wrighton,et al.  Short-Read Assembly of Full-Length 16S Amplicons Reveals Bacterial Diversity in Subsurface Sediments , 2013, PloS one.

[17]  Ian Korf,et al.  SAMSA: a comprehensive metatranscriptome analysis pipeline , 2016, BMC Bioinformatics.

[18]  Muhammad Zohaib Anwar,et al.  Total RNA sequencing reveals multilevel microbial community changes and functional responses to wood ash application in agricultural and forest soil , 2020, FEMS microbiology ecology.

[19]  James R. Cole,et al.  Reconstructing 16S rRNA genes in metagenomic data , 2015, Bioinform..

[20]  Tim Urich,et al.  Exploring the composition and diversity of microbial communities at the Jan Mayen hydrothermal vent field using RNA and DNA. , 2011, FEMS microbiology ecology.

[21]  Gene W. Tyson,et al.  Metatranscriptomics reveals unique microbial small RNAs in the ocean’s water column , 2009, Nature.

[22]  Daniel H. Huson,et al.  Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome , 2008, PloS one.

[23]  Paul J. McMurdie,et al.  DADA2: High resolution sample inference from Illumina amplicon data , 2016, Nature Methods.

[24]  Byung-Kwan Cho,et al.  Analysis of the mouse gut microbiome using full-length 16S rRNA amplicon sequencing , 2016, Scientific Reports.

[25]  Curtis A Suttle,et al.  Metagenomic Analysis of Coastal RNA Virus Communities , 2006, Science.

[26]  Muhammad Zohaib Anwar,et al.  Total RNA sequencing reveals multilevel microbial community changes and functional responses to wood ash application in agricultural and forest soil , 2019, bioRxiv.

[27]  Hélène Touzet,et al.  MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes , 2017, bioRxiv.

[28]  Hugh E. Olsen,et al.  The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community , 2016, Genome Biology.

[29]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[30]  Fernando Azpiroz,et al.  MetaTrans: an open-source pipeline for metatranscriptomics , 2016, Scientific Reports.

[31]  A. Heintz‐Buschart,et al.  IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses , 2016, Genome Biology.