GPA: A Microbial Genetic Polymorphisms Assignments Tool in Metagenomic Analysis by Bayesian Estimation

Identifying antimicrobial resistant (AMR) bacteria in metagenomics samples is essential for public health and food safety. Next-generation sequencing (NGS) technology has provided a powerful tool in identifying the genetic variation and constructing the correlations between genotype and phenotype in humans and other species. However, for complex bacterial samples, there lacks a powerful bioinformatic tool to identify genetic polymorphisms or copy number variations (CNVs) for given genes. Here we provide a Bayesian framework for genotype estimation for mixtures of multiple bacteria, named as Genetic Polymorphisms Assignments (GPA). Simulation results showed that GPA has reduced the false discovery rate (FDR) and mean absolute error (MAE) in CNV and single nucleotide variant (SNV) identification. This framework was validated by whole-genome sequencing and Pool-seq data from Klebsiella pneumoniae with multiple bacteria mixture models, and showed the high accuracy in the allele fraction detections of CNVs and SNVs in AMR genes between two populations. The quantitative study on the changes of AMR genes fraction between two samples showed a good consistency with the AMR pattern observed in the individual strains. Also, the framework together with the genome annotation and population comparison tools has been integrated into an application, which could provide a complete solution for AMR gene identification and quantification in unculturable clinical samples. The GPA package is available at https://github.com/IID-DTH/GPA-package.

[1]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[2]  David M. Aanensen,et al.  The multilocus sequence typing network: mlst.net , 2005, Nucleic Acids Res..

[3]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[4]  Martin C. J. Maiden,et al.  BIGSdb: Scalable analysis of bacterial genome variation at the population level , 2010, BMC Bioinformatics.

[5]  Heng Wang,et al.  Copy number variation detection using next generation sequencing read counts , 2014, BMC Bioinformatics.

[6]  David B. Knoester,et al.  Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq , 2014, BMC Genomics.

[7]  Joshua M. Korn,et al.  Discovery and genotyping of genome structural polymorphism by sequencing on a population scale , 2011, Nature Genetics.

[8]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[9]  Duy Tin Truong,et al.  MetaPhlAn2 for enhanced metagenomic taxonomic profiling , 2015, Nature Methods.

[10]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[11]  Raymond Lo,et al.  CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database , 2016, Nucleic Acids Res..

[12]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2010 .

[13]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[14]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[15]  Duy Tin Truong,et al.  Microbial strain-level population structure and genetic diversity from metagenomes , 2017, Genome research.

[16]  Fangqing Zhao,et al.  BreakSeek: a breakpoint-based algorithm for full spectral range INDEL detection , 2015, Nucleic acids research.

[17]  Derek Y. Chiang,et al.  High-resolution mapping of copy-number alterations with massively parallel sequencing , 2009, Nature Methods.

[18]  J. Bray,et al.  MLST revisited: the gene-by-gene approach to bacterial genomics , 2013, Nature Reviews Microbiology.

[19]  Fangqing Zhao,et al.  inGAP-sv: a novel scheme to identify and visualize structural variation from paired end mapping data , 2011, Nucleic Acids Res..

[20]  Xin Yang,et al.  Extra-binomial variation approach for analysis of pooled DNA sequencing data , 2012, Bioinform..

[21]  G. Pesole,et al.  SVM2: an improved paired-end-based tool for the detection of small genomic structural variations using high-throughput single-genome resequencing data , 2012, Nucleic acids research.

[22]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[23]  K. Jolley,et al.  Multi-locus sequence typing. , 2001, Methods in molecular medicine.

[24]  Davide Albanese,et al.  Strain profiling and epidemiology of bacterial species from metagenomic sequencing , 2017, Nature Communications.

[25]  Mark D. Johnson,et al.  Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion , 2011, Proceedings of the National Academy of Sciences.

[26]  D. Schwartz,et al.  Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis , 1984, Cell.

[27]  A. Sing,et al.  PubMLST.org--The new home for the Borrelia MLSA database. , 2015, Ticks and tick-borne diseases.

[28]  M. Stephens,et al.  , comparison with gene expression arrays RNA-seq : An assessment of technical reproducibility and data , 2008 .

[29]  J. O’Grady,et al.  Diagnosing antimicrobial resistance , 2017, Nature Reviews Microbiology.

[30]  Hugo Y. K. Lam,et al.  Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library , 2010, Nature Biotechnology.

[31]  R. Fisher 019: On the Interpretation of x2 from Contingency Tables, and the Calculation of P. , 1922 .

[32]  Ken Chen,et al.  Towards accurate characterization of clonal heterogeneity based on structural variation , 2014, BMC Bioinformatics.

[33]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[34]  Jian Wang,et al.  Metagenome-wide analysis of antibiotic resistance genes in a large cohort of human gut microbiota , 2013, Nature Communications.

[35]  F. Baquero,et al.  Tackling antibiotic resistance: the environmental framework , 2015, Nature Reviews Microbiology.

[36]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[37]  Chen Chen,et al.  Emergence of carbapenem-resistant hypervirulent Klebsiella pneumoniae. , 2018, The Lancet. Infectious diseases.