CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction

BackgroundCulture-independent molecular surveys targeting conserved marker genes, most notably 16S rRNA, to assess microbial diversity remain semi-quantitative due to variations in the number of gene copies between species.ResultsBased on 2,900 sequenced reference genomes, we show that 16S rRNA gene copy number (GCN) is strongly linked to microbial phylogenetic taxonomy, potentially under-representing Archaea in amplicon microbial profiles. Using this relationship, we inferred the GCN of all bacterial and archaeal lineages in the Greengenes database within a phylogenetic framework. We created CopyRighter, new software which uses these estimates to correct 16S rRNA amplicon microbial profiles and associated quantitative (q)PCR total abundance. CopyRighter parses microbial profiles and, because GCN estimates are pre-computed for all taxa in the reference taxonomy, rapidly corrects GCN bias. Software validation with in silico and in vitro mock communities indicated that GCN correction results in more accurate estimates of microbial relative abundance and improves the agreement between metagenomic and amplicon profiles. Analyses of human-associated and anaerobic digester microbiomes illustrate that correction makes tangible changes to estimates of qPCR total abundance, α and β diversity, and can significantly change biological interpretation. For example, human gut microbiomes from twins were reclassified into three rather than two enterotypes after GCN correction.ConclusionsThe CopyRighter bioinformatic tools permits rapid correction of GCN in microbial surveys, resulting in improved estimates of microbial abundance, α and β diversity.

[1]  P. Dixon VEGAN, a package of R functions for community ecology , 2003 .

[2]  P. Bork,et al.  Enterotypes of the human gut microbiome , 2011, Nature.

[3]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..

[4]  P. Baldrian,et al.  The Variability of the 16S rRNA Gene in Bacterial Genomes and Its Consequences for Bacterial Community Analyses , 2013, PloS one.

[5]  P. Hugenholtz,et al.  Multiple displacement amplification compromises quantitative analysis of metagenomes , 2010, Nature Methods.

[6]  L. Raskin,et al.  PCR Biases Distort Bacterial and Archaeal Community Structure in Pyrosequencing Datasets , 2012, PloS one.

[7]  John C. Wooley,et al.  Ultrafast clustering algorithms for metagenomic sequence analysis , 2012, Briefings Bioinform..

[8]  E. Stackebrandt,et al.  Effect of genome size and rrn gene copy number on PCR amplification of 16S rRNA genes from a mixture of bacterial species , 1995, Applied and environmental microbiology.

[9]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[10]  M. Pagel Inferring the historical patterns of biological evolution , 1999, Nature.

[11]  Haixu Tang,et al.  Comparing Bacterial Communities Inferred from 16s Rrna Gene Sequencing and Shotgun Metagenomics , 2011, Pacific Symposium on Biocomputing.

[12]  P. Legendre,et al.  Associations between species and groups of sites: indices and statistical inference. , 2009, Ecology.

[13]  K. Schleifer,et al.  How quantitative is quantitative PCR with respect to cell counts? , 2000, Systematic and applied microbiology.

[14]  B. Haas,et al.  Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. , 2011, Genome research.

[15]  Mark B Gerstein,et al.  Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing , 2006, BMC Genomics.

[16]  Alexander J. Probst,et al.  Archaea on Human Skin , 2013, PloS one.

[17]  K. Clements,et al.  Extreme polyploidy in a large bacterium , 2008, Proceedings of the National Academy of Sciences.

[18]  Lauren M. Bragg,et al.  Fast, accurate error-correction of amplicon pyrosequences using Acacia , 2012, Nature Methods.

[19]  G. B. Fogel,et al.  Prokaryotic Genome Size and SSU rDNA Copy Number: Estimation of Microbial Relative Abundance from a Mixed Population , 1999, Microbial Ecology.

[20]  Rajat Rastogi,et al.  Visualization of ribosomal RNA operon copy number distribution , 2009, BMC Microbiology.

[21]  T. Schmidt,et al.  rRNA Operon Copy Number Reflects Ecological Strategies of Bacteria , 2000, Applied and Environmental Microbiology.

[22]  Peter E. Larsen,et al.  Predicting bacterial community assemblages using an artificial neural network approach. , 2012, Methods in molecular biology.

[23]  C. Criddle,et al.  Understanding bias in microbial community analysis techniques due to rrn operon copy number heterogeneity. , 2003, BioTechniques.

[24]  Neil Hunter,et al.  Determination of bacterial load by real-time PCR using a broad-range (universal) probe and primers set. , 2002, Microbiology.

[25]  F. Chen,et al.  Experimental factors affecting PCR-based estimates of microbial species richness and evenness , 2010, The ISME Journal.

[26]  Eric P. Nawrocki,et al.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea , 2011, The ISME Journal.

[27]  G. Tyson,et al.  Spatial uniformity of microbial diversity in a continuous bioelectrochemical system. , 2013, Bioresource technology.

[28]  S. Acinas,et al.  Divergence and Redundancy of 16S rRNA Sequences in Genomes with Multiple rrn Operons , 2004, Journal of bacteriology.

[29]  Liam J. Revell,et al.  phytools: an R package for phylogenetic comparative biology (and other things) , 2012 .

[30]  J. Ravel,et al.  Evaluation of Methods for the Extraction and Purification of DNA from the Human Microbiome , 2012, PloS one.

[31]  Jonathan A. Eisen,et al.  Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance , 2012, PLoS Comput. Biol..

[32]  G. Tyson,et al.  High-Throughput Amplicon Sequencing Reveals Distinct Communities within a Corroding Concrete Sewer System , 2012, Applied and Environmental Microbiology.

[33]  Bernard J. Pope,et al.  Bpipe: a tool for running and managing bioinformatics pipelines , 2012, Bioinform..

[34]  Rutger A. Vos,et al.  BIO::Phylo-phyloinformatic analysis using perl , 2011, BMC Bioinformatics.

[35]  H. Ishikawa,et al.  Genomic copy number of intracellular bacterial symbionts of aphids varies in response to developmental stage and morph of their host. , 2000, Insect biochemistry and molecular biology.

[36]  Limin Fu,et al.  Artificial and natural duplicates in pyrosequencing reads of metagenomic data , 2010, BMC Bioinformatics.

[37]  S. Sørensen,et al.  Gut Microbiota in Human Adults with Type 2 Diabetes Differs from Non-Diabetic Adults , 2010, PloS one.

[38]  F. Bushman,et al.  Linking Long-Term Dietary Patterns with Gut Microbial Enterotypes , 2011, Science.

[39]  Peter E. Larsen,et al.  Predicting bacterial community assemblages using an artificial neural network approach , 2012, Nature Methods.

[40]  Thomas M. Schmidt,et al.  rrnDB: documenting the number of rRNA and tRNA genes in bacteria and archaea , 2008, Nucleic Acids Res..

[41]  F. Shanahan,et al.  Categorization of the gut microbiota: enterotypes or gradients? , 2012, Nature Reviews Microbiology.

[42]  David M. Treiman,et al.  Changing of the guard , 2012, Epilepsy Research.

[43]  Florent E. Angly,et al.  The Bio-Community Perl toolkit for microbial ecology , 2014, Bioinform..

[44]  B. Foy,et al.  Optimizing the analysis of human intestinal microbiota with phylogenetic microarray. , 2011, FEMS microbiology ecology.

[45]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[46]  P. Turnbaugh,et al.  Microbial ecology: Human gut microbes associated with obesity , 2006, Nature.

[47]  Lu Wang,et al.  The NIH Human Microbiome Project. , 2009, Genome research.

[48]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[49]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[50]  M. Tomita,et al.  An Extreme Thermophile, Thermus thermophilus, Is a Polyploid Bacterium , 2010, Journal of bacteriology.

[51]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[52]  Campbell O. Webb,et al.  Picante: R tools for integrating phylogenies and ecology , 2010, Bioinform..

[53]  W. Nseir,et al.  Obesity as a risk factor for Clostridium difficile infection. , 2013, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[54]  Curtis Huttenhower,et al.  A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets , 2013, PLoS Comput. Biol..

[55]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[56]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[57]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[58]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[59]  Tsunglin Liu,et al.  Effects of GC Bias in Next-Generation-Sequencing Data on De Novo Genome Assembly , 2013, PloS one.

[60]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[61]  Evgeny M. Zdobnov,et al.  The Newick utilities: high-throughput phylogenetic tree processing in the Unix shell , 2010, Bioinform..

[62]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[63]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[64]  N. Pace Mapping the Tree of Life: Progress and Prospects , 2009, Microbiology and Molecular Biology Reviews.

[65]  W. D. de Vos,et al.  Development and application of the human intestinal tract chip, a phylogenetic microarray: analysis of universally conserved phylotypes in the abundant microbiota of young and elderly adults , 2009, Environmental microbiology.

[66]  Forest Rohwer,et al.  The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes , 2009, PLoS Comput. Biol..

[67]  Pierre Taberlet,et al.  ITS as an environmental DNA barcode for fungi: an in silico approach reveals potential PCR biases , 2010, BMC Microbiology.

[68]  Matthew Z. DeMaere,et al.  The genomic basis of trophic strategy in marine bacteria , 2009, Proceedings of the National Academy of Sciences.

[69]  Anthony R. Ives,et al.  Using the Past to Predict the Present: Confidence Intervals for Regression Equations in Phylogenetic Comparative Methods , 2000, The American Naturalist.

[70]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[71]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.