Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data

Prediction of antibiotic resistance phenotypes from whole genome sequencing data by machine learning methods has been proposed as a promising platform for the development of sequence-based diagnostics. However, there has been no systematic evaluation of factors that may influence performance of such models, how they might apply to and vary across clinical populations, and what the implications might be in the clinical setting. Here, we performed a meta-analysis of seven large Neisseria gonorrhoeae datasets, as well as Klebsiella pneumoniae and Acinetobacter baumannii datasets, with whole genome sequence data and antibiotic susceptibility phenotypes using set covering machine classification, random forest classification, and random forest regression models to predict resistance phenotypes from genotype. We demonstrate how model performance varies by drug, dataset, resistance metric, and species, reflecting the complexities of generating clinically relevant conclusions from machine learning-derived models. Our findings underscore the importance of incorporating relevant biological and epidemiological knowledge into model design and assessment and suggest that doing so can inform tailored modeling for individual drugs, pathogens, and clinical populations. We further suggest that continued comprehensive sampling and incorporation of up-to-date whole genome sequence data, resistance phenotypes, and treatment outcome data into model training will be crucial to the clinical utility and sustainability of machine learning-based molecular diagnostics. Author Summary Machine learning-based prediction of antibiotic resistance from bacterial genome sequences represents a promising tool to rapidly determine the antibiotic susceptibility profile of clinical isolates and reduce the morbidity and mortality resulting from inappropriate and ineffective treatment. However, while there has been much focus on demonstrating the diagnostic potential of these modeling approaches, there has been little assessment of potential caveats and prerequisites associated with implementing predictive models of drug resistance in the clinical setting. Our results highlight significant biological and technical challenges facing the application of machine learning-based prediction of antibiotic resistance as a diagnostic tool. By outlining specific factors affecting model performance, our findings provide a framework for future work on modeling drug resistance and underscore the necessity of continued comprehensive sampling and reporting of treatment outcome data for building reliable and sustainable diagnostics.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  P. Moodley,et al.  Ciprofloxacin resistance in Neisseria gonorrhoeae , 2001, The Lancet.

[3]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[6]  M. Siedner,et al.  Real-Time PCR Assay for Detection of Quinolone-Resistant Neisseria gonorrhoeae in Urine Samples , 2007, Journal of Clinical Microbiology.

[7]  Kathleen A. Shutt,et al.  Failure of Current Cefepime Breakpoints To Predict Clinical Outcomes of Bacteremia Caused by Gram-Negative Organisms , 2007, Antimicrobial Agents and Chemotherapy.

[8]  V. Tam,et al.  Outcomes of bacteremia due to Pseudomonas aeruginosa with reduced susceptibility to piperacillin-tazobactam: implications on the appropriateness of the resistance breakpoint. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[9]  M. Unemo,et al.  Antibiotic resistance in Neisseria gonorrhoeae: origin, evolution, and lessons learned for the future , 2011, Annals of the New York Academy of Sciences.

[10]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[11]  Daniel J. Wilson,et al.  Transforming clinical microbiology with bacterial genome sequencing , 2012, Nature Reviews Genetics.

[12]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[13]  L. Rigouts,et al.  Rifampin Resistance Missed in Automated Liquid Culture System for Mycobacterium tuberculosis Isolates with Specific rpoB Mutations , 2013, Journal of Clinical Microbiology.

[14]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[15]  Fernando De la Torre,et al.  Facing Imbalanced Data--Recommendations for the Use of Performance Metrics , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[16]  S. Cosgrove,et al.  Outcomes of Children With Enterobacteriaceae Bacteremia With Reduced Susceptibility to Ceftriaxone: Do the Revised Breakpoints Translate to Improved Patient Outcomes? , 2013, The Pediatric infectious disease journal.

[17]  Dominique Lavenier,et al.  DSK: k-mer counting with very low memory usage , 2013, Bioinform..

[18]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[19]  Razvan Sultana,et al.  Genomic Analysis Identifies Targets of Convergent Positive Selection in Drug Resistant Mycobacterium tuberculosis , 2013, Nature Genetics.

[20]  Mohamed Bekkar,et al.  Evaluation Measures for Models Assessment over Imbalanced Data Sets , 2013 .

[21]  Christian Drosten,et al.  Rapid point of care diagnostic tests for viral and bacterial respiratory tract infections—needs, advances, and future prospects , 2014, The Lancet Infectious Diseases.

[22]  Julian Parkhill,et al.  Genomic epidemiology of Neisseria gonorrhoeae with reduced susceptibility to cefixime in the USA: a retrospective observational study , 2014, The Lancet. Infectious diseases.

[23]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[24]  David A. Clifton,et al.  Machine learning for the prediction of antibacterial susceptibility in Mycobacterium tuberculosis , 2014, IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[25]  Paige E Waterman,et al.  The antimicrobial resistance monitoring and research (ARMoR) program: the US Department of Defense response to escalating antimicrobial resistance. , 2014, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[26]  Daniel J. Wilson,et al.  Prediction of Staphylococcus aureus Antimicrobial Resistance by Whole-Genome Sequencing , 2014, Journal of Clinical Microbiology.

[27]  G. Horsman,et al.  Whole-Genome Phylogenomic Heterogeneity of Neisseria gonorrhoeae Isolates with Decreased Cephalosporin Susceptibility Collected in Canada between 1989 and 2013 , 2014, Journal of Clinical Microbiology.

[28]  Jonathan Wilksch,et al.  Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health , 2015, Proceedings of the National Academy of Sciences.

[29]  Phelim Bradley,et al.  Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis , 2015, Nature Communications.

[30]  Phelim Bradley,et al.  Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study , 2015, The Lancet. Infectious diseases.

[31]  Matthew D. Zimmerman,et al.  The association between sterilizing activity and drug distribution into tuberculosis lesions , 2015, Nature Medicine.

[32]  David A. Clifton,et al.  Identifying lineage effects when controlling for population structure improves power in bacterial association studies , 2015, Nature Microbiology.

[33]  Maxime Déraspe,et al.  Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons , 2016, BMC Genomics.

[34]  Carey-Ann D. Burnham,et al.  Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data , 2016, Front. Microbiol..

[35]  Jukka Corander,et al.  Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes , 2016, Nature Communications.

[36]  J. O'Neill,et al.  Tackling drug-resistant infections globally: final report and recommendations , 2016 .

[37]  Anna G. Green,et al.  Genomic Epidemiology of Gonococcal Resistance to Extended-Spectrum Cephalosporins, Macrolides, and Fluoroquinolones in the United States, 2000–2013 , 2016, The Journal of infectious diseases.

[38]  R. Humphries,et al.  Performance and Verification of a Real-Time PCR Assay Targeting the gyrA Gene for Prediction of Ciprofloxacin Resistance in Neisseria gonorrhoeae , 2016, Journal of Clinical Microbiology.

[39]  G. Horsman,et al.  Genomic Epidemiology and Molecular Resistance Mechanisms of Azithromycin-Resistant Neisseria gonorrhoeae in Canada from 1997 to 2014 , 2016, Journal of Clinical Microbiology.

[40]  Daniel J. Wilson,et al.  Whole-genome sequencing to determine transmission of Neisseria gonorrhoeae: an observational study. , 2016, The Lancet. Infectious diseases.

[41]  Fangfang Xia,et al.  Antimicrobial Resistance Prediction in PATRIC and RAST , 2016, Scientific Reports.

[42]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[43]  Yonatan H. Grad,et al.  WGS to predict antibiotic MICs for Neisseria gonorrhoeae , 2017, The Journal of antimicrobial chemotherapy.

[44]  E. André,et al.  Novel rapid PCR for the detection of Ile491Phe rpoB mutation of Mycobacterium tuberculosis, a rifampicin-resistance-conferring mutation undetected by commercial assays. , 2017, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[45]  L. McGee,et al.  Validation of β-lactam minimum inhibitory concentration predictions for pneumococcal isolates with newly encountered penicillin binding protein (PBP) sequences , 2017, BMC Genomics.

[46]  James J. Davis,et al.  Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae , 2017, Scientific Reports.

[47]  R. Humphries,et al.  CLSI Methods Development and Standardization Working Group Best Practices for Evaluation of Antimicrobial Susceptibility Tests , 2018, Journal of Clinical Microbiology.

[48]  David A. Clifton,et al.  Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data , 2017, Bioinform..

[49]  Shaohua Zhao,et al.  Using machine learning to predict antimicrobial minimum inhibitory concentrations and associated genomic features for nontyphoidal Salmonella , 2018, bioRxiv.

[50]  John L. Johnson,et al.  Bacterial Factors That Predict Relapse after Tuberculosis Therapy , 2018, The New England journal of medicine.

[51]  K. Yahara,et al.  Genomic surveillance of Neisseria gonorrhoeae to investigate the distribution and evolution of antimicrobial-resistance determinants and lineages , 2018, Microbial genomics.

[52]  I. Kohane,et al.  Deep learning predicts tuberculosis drug resistance status from genome sequencing data , 2018, bioRxiv.

[53]  Phelim Bradley,et al.  Accuracy of Different Bioinformatics Methods in Detecting Antibiotic Resistance and Virulence Factors from Staphylococcus aureus Whole-Genome Sequences , 2018, Journal of Clinical Microbiology.

[54]  V. Mizrahi,et al.  Mycobacterium tuberculosis. , 2018, Trends in microbiology.

[55]  J. Kwong,et al.  Genomic epidemiology and antimicrobial resistance of Neisseria gonorrhoeae in New Zealand , 2017, The Journal of antimicrobial chemotherapy.

[56]  Y. Grad,et al.  Azithromycin Resistance through Interspecific Acquisition of an Epistasis-Dependent Efflux Pump Component and Transcriptional Regulator in Neisseria gonorrhoeae , 2018, mBio.

[57]  Leopold Parts,et al.  Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data , 2018, PLoS Comput. Biol..

[58]  Raquel Abad,et al.  Public health surveillance of multidrug-resistant clones of Neisseria gonorrhoeae in Europe: a genomic survey , 2018, The Lancet. Infectious diseases.

[59]  James J. Davis,et al.  Using machine learning to predict antimicrobial minimum inhibitory concentrations and associated genomic features for nontyphoidal Salmonella , 2018, bioRxiv.

[60]  J. Jeukens,et al.  Genomics of antibiotic‐resistance prediction in Pseudomonas aeruginosa , 2017, Annals of the New York Academy of Sciences.

[61]  M. Lipsitch,et al.  Azithromycin Susceptibility Among Neisseria gonorrhoeae Isolates and Seasonal Macrolide Use , 2018, The Journal of infectious diseases.

[62]  François Laviolette,et al.  Interpretable genotype-to-phenotype classifiers with performance guarantees , 2018, Scientific Reports.

[63]  M. Gribskov,et al.  Comparative genome analysis reveals niche-specific genome expansion in Acinetobacter baumannii strains , 2019, PloS one.