Assessing the heterogeneity of in silico plasmid predictions based on whole-genome-sequenced clinical isolates

High-throughput next-generation shotgun sequencing of pathogenic bacteria is growing in clinical relevance, especially for chromosomal DNA-based taxonomic identification and for antibiotic resistance prediction. Genetic exchange is facilitated for extrachromosomal DNA, e.g. plasmid-borne antibiotic resistance genes. Consequently, accurate identification of plasmids from whole-genome sequencing (WGS) data remains one of the major challenges for sequencing-based precision medicine in infectious diseases. Here, we assess the heterogeneity of four state-of-the-art tools (cBar, PlasmidFinder, plasmidSPAdes and Recycler) for the in silico prediction of plasmid-derived sequences from WGS data. Heterogeneity, sensitivity and precision were evaluated by reference-independent and reference-dependent benchmarking using 846 Gram-negative clinical isolates. Interestingly, the majority of predicted sequences were tool-specific, resulting in a pronounced heterogeneity across tools for the reference-independent assessment. In the reference-dependent assessment, sensitivity and precision values were found to substantially vary between tools and across taxa, with cBar exhibiting the highest median sensitivity (87.45%) but a low median precision (27.05%). Furthermore, integrating the individual tools into an ensemble approach showed increased sensitivity (95.55%) while reducing the precision (25.62%). CBar and plasmidSPAdes exhibited the strongest concordance with respect to identified antibiotic resistance factors. Moreover, false-positive plasmid predictions typically contained only few antibiotic resistance factors. In conclusion, while high degrees of heterogeneity and variation in sensitivity and precision were observed across the different tools and taxa, existing tools are valuable for investigating the plasmid-borne resistome. Nevertheless, additional studies on representative clinical data sets will be necessary to translate in silico plasmid prediction approaches from research to clinical application.

[1]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[2]  Julian Parkhill,et al.  Comparison of bacterial genome assembly software for MinION data and their applicability to medical microbiology , 2016, bioRxiv.

[3]  R. Gibbs,et al.  Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology , 2012, PloS one.

[4]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[5]  G. Dougan,et al.  PlasmidTron: assembling the cause of phenotypes from NGS data , 2017, bioRxiv.

[6]  Ying Xu,et al.  cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data , 2010, Bioinform..

[7]  Jian-Hua Liu,et al.  Dissemination of the mcr-1 colistin resistance gene. , 2016, The Lancet. Infectious diseases.

[8]  Anna E. Sheppard,et al.  Plasmid Classification in an Era of Whole-Genome Sequencing: Application in Studies of Antibiotic Resistance Epidemiology , 2017, Front. Microbiol..

[9]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[10]  C. Drainas,et al.  A classification scheme for mobilization regions of bacterial plasmids. , 2004, FEMS microbiology reviews.

[11]  Christina Backes,et al.  Comparing genome versus proteome-based identification of clinical bacterial isolates , 2016, Briefings Bioinform..

[12]  S. Pukatzki,et al.  A multidrug resistance plasmid contains the molecular switch for type VI secretion in Acinetobacter baumannii , 2015, Proceedings of the National Academy of Sciences.

[13]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[14]  Alessandra Carattoli,et al.  Resistance Plasmid Families in Enterobacteriaceae , 2009, Antimicrobial Agents and Chemotherapy.

[15]  C. Deming,et al.  Plasmid Dynamics in KPC-Positive Klebsiella pneumoniae during Long-Term Patient Colonization , 2016, mBio.

[16]  Zamin Iqbal,et al.  Resolving plasmid structures in Enterobacteriaceae using the MinION nanopore sequencer: assessment of MinION and MinION/Illumina hybrid data assembly approaches , 2017, Microbial genomics.

[17]  Fernando de la Cruz,et al.  Plasmid Flux in Escherichia coli ST131 Sublineages, Analyzed by Plasmid Constellation Network (PLACNET), a New Method for Plasmid Reconstruction from Whole Genome Sequences , 2014, PLoS genetics.

[18]  S. Tynkkynen,et al.  Construction of Streptococcus lactis subsp. lactis Strains with a Single Plasmid Associated with Mucoid Phenotype , 1987, Applied and environmental microbiology.

[19]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[20]  A. von Haeseler,et al.  Next-generation sequencing diagnostics of bacteremia in septic patients , 2016, Genome Medicine.

[21]  Eran Halperin,et al.  Recycler: an algorithm for detecting plasmids from de novo assembly graphs , 2016, bioRxiv.

[22]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[23]  P. Gulig,et al.  Virulence Plasmid-Borne spvB and spvC Genes Can Replace the 90-Kilobase Plasmid in Conferring Virulence to Salmonella enterica Serovar Typhimurium in Subcutaneously Inoculated Mice , 2001, Journal of bacteriology.

[24]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[25]  A. Friedrich,et al.  Characterization of a CTX-M-15 Producing Klebsiella Pneumoniae Outbreak Strain Assigned to a Novel Sequence Type (1427) , 2015, Front. Microbiol..

[26]  F. Fang,et al.  Plasmid-mediated virulence genes in non-typhoid Salmonella serovars. , 1994, FEMS microbiology letters.

[27]  Ryan R. Wick,et al.  Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads , 2016, bioRxiv.

[28]  Molly K. Gibson,et al.  Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology , 2014, The ISME Journal.

[29]  Dmitry Antipov,et al.  plasmidSPAdes: Assembling Plasmids from Whole Genome Sequencing Data , 2016, bioRxiv.

[30]  Sergio Arredondo-Alonso,et al.  On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data , 2017, bioRxiv.

[31]  Manuel Espinosa,et al.  Plasmids Replication and Control of Circular Bacterial , 1998 .

[32]  Ole Lund,et al.  In Silico Detection and Typing of Plasmids using PlasmidFinder and Plasmid Multilocus Sequence Typing , 2014, Antimicrobial Agents and Chemotherapy.

[33]  N. Loman,et al.  A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4. , 2013, JAMA.

[34]  K. Holt,et al.  A small Acinetobacter plasmid carrying the tet39 tetracycline resistance determinant , 2015, The Journal of antimicrobial chemotherapy.

[35]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[36]  Xi Zhang,et al.  Complete Sequence of pABTJ2, A Plasmid from Acinetobacter baumannii MDR-TJ, Carrying Many Phage-like Elements , 2014, Genom. Proteom. Bioinform..

[37]  J. Rolain,et al.  Dissemination of the mcr-1 colistin resistance gene , 2016 .

[38]  Regulation of plasmid replication. , 1984, Microbiological reviews.

[39]  B. Appel,et al.  The pYV virulence plasmids of Yersinia pseudotuberculosis and Y. pestis contain a conserved DNA region responsible for the mobilization by the self-transmissible plasmid pYE854. , 2012, Environmental microbiology reports.

[40]  Eran Segal,et al.  Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples , 2015, Science.

[41]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[42]  Jianzhong Shen,et al.  Emergence of plasmid-mediated colistin resistance mechanism MCR-1 in animals and human beings in China: a microbiological and molecular biological study. , 2015, The Lancet. Infectious diseases.

[43]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[44]  Luiz Irber,et al.  sourmash: a library for MinHash sketching of DNA , 2016, J. Open Source Softw..

[45]  Laura S. Frost,et al.  Mobile genetic elements: the agents of open source evolution , 2005, Nature Reviews Microbiology.

[46]  F. de la Cruz,et al.  A Degenerate Primer MOB Typing (DPMT) Method to Classify Gamma-Proteobacterial Plasmids in Clinical and Environmental Settings , 2012, PloS one.

[47]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[48]  Aaron R Quinlan,et al.  Erratum: A reference bacterial genome dataset generated on the MinIONTM portable single-molecule nanopore sequencer , 2015, GigaScience.

[49]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[50]  M. Couturier,et al.  Identification and classification of bacterial plasmids. , 1988, Microbiological reviews.

[51]  A. Carattoli,et al.  Identification of plasmids by PCR-based replicon typing. , 2005, Journal of microbiological methods.