Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency

BackgroundInvestigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However, at present, taxonomic binning tools for metagenomes discard 30-40% of Sanger sequencing data due to the stringency of BLAST cut-offs. In an attempt to provide a comprehensive overview of metagenomic data, we re-analyzed the discarded metagenomes by using less stringent cut-offs. Additionally, we introduced a new criterion, namely, the evolutionary conservation of adjacency between neighboring genes. To evaluate the feasibility of our approach, we re-analyzed discarded contigs and singletons from several environments with different levels of complexity. We also compared the consistency between our taxonomic binning and those reported in the original studies.ResultsAmong the discarded data, we found that 23.7 ± 3.9% of singletons and 14.1 ± 1.0% of contigs were assigned to taxa. The recovery rates for singletons were higher than those for contigs. The Pearson correlation coefficient revealed a high degree of similarity (0.94 ± 0.03 at the phylum rank and 0.80 ± 0.11 at the family rank) between the proposed taxonomic binning approach and those reported in original studies. In addition, an evaluation using simulated data demonstrated the reliability of the proposed approach.ConclusionsOur findings suggest that taking account of conserved neighboring gene adjacency improves taxonomic assignment when analyzing metagenomes using Sanger sequencing. In other words, utilizing the conserved gene order as a criterion will reduce the amount of data discarded when analyzing metagenomes.

[1]  Stephan C Schuster,et al.  Metagenomic signatures of the Peru Margin subseafloor biosphere show a genetically distinct environment , 2008, Proceedings of the National Academy of Sciences.

[2]  J. Tamames,et al.  Metagenomics reveals our incomplete knowledge of global diversity , 2008, Bioinform..

[3]  Masaru Tomita,et al.  On dynamics of overlapping genes in bacterial genomes. , 2003, Gene.

[4]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[5]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  T. Takagi,et al.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences , 2006, Nucleic acids research.

[7]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[8]  E V Koonin,et al.  Gene order is not conserved in bacterial evolution. , 1996, Trends in genetics : TIG.

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[11]  P. Bork,et al.  Measuring genome evolution. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[13]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[14]  Jean-Michel Claverie,et al.  Taxonomic distribution of large DNA viruses in the sea , 2008, Genome Biology.

[15]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[16]  Scott A Givan,et al.  Natural variation in SAR11 marine bacterioplankton genomes inferred from metagenomic data , 2007, Biology Direct.

[17]  Kun Tang,et al.  Global diversity in the human salivary microbiome. , 2009, Genome research.

[18]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[19]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[20]  A. N. Spiridonov,et al.  Congruent evolution of different classes of non-coding DNA in prokaryotic genomes. , 2002, Nucleic acids research.

[21]  J. Lake,et al.  Horizontal gene transfer among genomes: the complexity hypothesis. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Ruth Ann Luna,et al.  Metagenomic pyrosequencing and microbial identification. , 2009, Clinical chemistry.

[23]  C. Fraser,et al.  The Bacterial Species Challenge: Making Sense of Genetic and Ecological Diversity , 2009, Science.

[24]  Natalia Ivanova,et al.  Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities , 2006, Nature Biotechnology.

[25]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[26]  H. Matsuda,et al.  Biased biological functions of horizontally transferred genes in prokaryotic genomes , 2004, Nature Genetics.

[27]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[28]  T. Grundström,et al.  Overlapping genes. , 1983, Annual review of genetics.

[29]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[30]  P. Hugenholtz,et al.  Why the ‘ meta ’ in metagenomics ? , 2022 .

[31]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[32]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[33]  Edward M. Rubin,et al.  Metagenomics: DNA sequencing of environmental samples , 2005, Nature Reviews Genetics.

[34]  J. Townsend,et al.  Horizontal gene transfer, genome innovation and evolution , 2005, Nature Reviews Microbiology.

[35]  Naryttza N. Diaz,et al.  Phylogenetic classification of short environmental DNA fragments , 2008, Nucleic acids research.

[36]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[37]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[38]  Daniel H. Huson,et al.  Short clones or long clones? A simulation study on the use of paired reads in metagenomics , 2010, BMC Bioinformatics.

[39]  S. Tringe,et al.  Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments , 2007, Science.

[40]  Naryttza N. Diaz,et al.  TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach , 2009, BMC Bioinformatics.

[41]  P. Bork,et al.  A Molecular Study of Microbe Transfer between Distant Environments , 2008, PloS one.

[42]  Willy Valdivia-Granda The next meta-challenge for Bioinformatics , 2008, Bioinformation.

[43]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[44]  Olga Zhaxybayeva,et al.  Genome mosaicism and organismal lineages. , 2004, Trends in genetics : TIG.

[45]  A. Valencia,et al.  Conserved Clusters of Functionally Related Genes in Two Bacterial Genomes , 1997, Journal of Molecular Evolution.

[46]  P. Bork,et al.  Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? , 2008, BMC Genomics.

[47]  L. Hillier,et al.  PCAP: a whole-genome assembly program. , 2003, Genome research.

[48]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[49]  Hiroshi Mori,et al.  Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes , 2007, DNA research : an international journal for rapid publication of reports on genes and genomes.

[50]  Javier Tamames,et al.  Evolution of gene order conservation in prokaryotes , 2001, Genome Biology.

[51]  M. Ferrer,et al.  Metagenomics approaches in systems microbiology. , 2009, FEMS microbiology reviews.

[52]  M. Pop,et al.  Metagenomic Analysis of the Human Distal Gut Microbiome , 2006, Science.

[53]  BMC Bioinformatics , 2005 .