VITCOMIC2: visualization tool for the phylogenetic composition of microbial communities based on 16S rRNA gene amplicons and metagenomic shotgun sequencing

BackgroundThe 16S rRNA gene-based amplicon sequencing analysis is widely used to determine the taxonomic composition of microbial communities. Once the taxonomic composition of each community is obtained, evolutionary relationships among taxa are inferred by a phylogenetic tree. Thus, the combined representation of taxonomic composition and phylogenetic relationships among taxa is a powerful method for understanding microbial community structure; however, applying phylogenetic tree-based representation with information on the abundance of thousands or more taxa in each community is a difficult task. For this purpose, we previously developed the tool VITCOMIC (VIsualization tool for Taxonomic COmpositions of MIcrobial Community), which is based on the genome-sequenced microbes’ phylogenetic information. Here, we introduce VITCOMIC2, which incorporates substantive improvements over VITCOMIC that were necessary to address several issues associated with 16S rRNA gene-based analysis of microbial communities.ResultsWe developed VITCOMIC2 to provide (i) sequence identity searches against broad reference taxa including uncultured taxa; (ii) normalization of 16S rRNA gene copy number differences among taxa; (iii) rapid sequence identity searches by applying the graphics processing unit-based sequence identity search tool CLAST; (iv) accurate taxonomic composition inference and nearly full-length 16S rRNA gene sequence reconstructions for metagenomic shotgun sequencing; and (v) an interactive user interface for simultaneous representation of the taxonomic composition of microbial communities and phylogenetic relationships among taxa. We validated the accuracy of processes (ii) and (iv) by using metagenomic shotgun sequencing data from a mock microbial community.ConclusionsThe improvements incorporated into VITCOMIC2 enable users to acquire an intuitive understanding of microbial community composition based on the 16S rRNA gene sequence data obtained from both metagenomic shotgun and amplicon sequencing.

[1]  Jan-Fang Cheng,et al.  Next generation sequencing data of a defined microbial mock community , 2016, Scientific Data.

[2]  Hiroshi Mori,et al.  Design and Experimental Application of a Novel Non-Degenerate Universal Primer Set that Amplifies Prokaryotic 16S rRNA Genes with a Low Possibility to Amplify Eukaryotic rRNA Genes , 2013, DNA research : an international journal for rapid publication of reports on genes and genomes.

[3]  Hiroshi Mori,et al.  VITCOMIC: visualization tool for taxonomic compositions of microbial communities based on 16S rRNA gene sequences , 2010, BMC Bioinformatics.

[4]  Jonathan A. Eisen,et al.  Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance , 2012, PLoS Comput. Biol..

[5]  Moriya Ohkuma,et al.  Characteristics of Microbial Communities in Crustal Fluids in a Deep-Sea Hydrothermal Field of the Suiyo Seamount , 2013, Front. Microbiol..

[6]  Kazutaka Katoh,et al.  MAFFT: iterative refinement and additional methods. , 2014, Methods in molecular biology.

[7]  William G. Mckendree,et al.  ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences , 2009, Nucleic acids research.

[8]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[9]  N. Pace A molecular view of microbial diversity and the biosphere. , 1997, Science.

[10]  R. Knight,et al.  Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex , 2008, Nature Methods.

[11]  Christian von Mering,et al.  MAPseq: highly efficient k-mer search with confidence estimates, for rRNA sequence analysis , 2017, Bioinform..

[12]  H. Chandler Database , 1985 .

[13]  Cédric Notredame,et al.  Upcoming challenges for multiple sequence alignment methods in the high-throughput era , 2009, Bioinform..

[14]  K. Schleifer,et al.  ARB: a software environment for sequence data. , 2004, Nucleic acids research.

[15]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[16]  Florent E. Angly,et al.  CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction , 2014, Microbiome.

[17]  Hiroshi Mori,et al.  CLAST: CUDA implemented large-scale alignment search tool , 2014, BMC Bioinformatics.

[18]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[19]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[20]  Paramasamy Gunasekaran,et al.  Assessment of Microbial Richness in Pelagic Sediment of Andaman Sea by Bacterial Tag Encoded FLX Titanium Amplicon Pyrosequencing (bTEFAP) , 2012, Indian Journal of Microbiology.

[21]  S. Tangphatsornruang,et al.  Metagenomic profiles of free-living archaea, bacteria and small eukaryotes in coastal areas of Sichang island, Thailand , 2012, BMC Genomics.

[22]  W. Martin,et al.  Networks of Gene Sharing among 329 Proteobacterial Genomes Reveal Differences in Lateral Gene Transfer Frequency at Different Phylogenetic Depths , 2010, Molecular biology and evolution.

[23]  Anthony A. Fodor,et al.  Effects of Experimental Choices and Analysis Noise on Surveys of the “Rare Biosphere” , 2009, Applied and Environmental Microbiology.

[24]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[25]  Chao Xie,et al.  RiboTagger: fast and unbiased 16S/18S profiling using whole community shotgun metagenomic or metatranscriptome surveys , 2016, BMC Bioinformatics.

[26]  Rajat Rastogi,et al.  Visualization of ribosomal RNA operon copy number distribution , 2009, BMC Microbiology.

[27]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[28]  S. Acinas,et al.  Divergence and Redundancy of 16S rRNA Sequences in Genomes with Multiple rrn Operons , 2004, Journal of bacteriology.

[29]  Peer Bork,et al.  Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation , 2007, Bioinform..

[30]  J. Shine,et al.  The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[31]  J. Clarridge,et al.  Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases , 2004, Clinical Microbiology Reviews.

[32]  Hélène Touzet,et al.  SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data , 2012, Bioinform..

[33]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[34]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[35]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[36]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[37]  Samson O Obado,et al.  Centromere-associated repeat arrays on Trypanosoma brucei chromosomes are much more extensive than predicted , 2012, BMC Genomics.

[38]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[39]  Hans H. Cheng,et al.  Characterization of microbial diversity by determining terminal restriction fragment length polymorphisms of genes encoding 16S rRNA , 1997, Applied and environmental microbiology.

[40]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[41]  Margaret C. Linak,et al.  Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.