Systematic analysis of supervised machine learning as an effective approach to predicate β-lactam resistance phenotype in Streptococcus pneumoniae

Streptococcus pneumoniae is the most common human respiratory pathogen, and β-lactam antibiotics have been employed to treat infections caused by S. pneumoniae for decades. β-lactam resistance is steadily increasing in pneumococci and is mainly associated with the alteration in penicillin-binding proteins (PBPs) that reduce binding affinity of antibiotics to PBPs. However, the high variability of PBPs in clinical isolates and their mosaic gene structure hamper the predication of resistance level according to the PBP gene sequences. In this study, we developed a systematic strategy for applying supervised machine learning to predict S. pneumoniae antimicrobial susceptibility to β-lactam antibiotics. We combined published PBP sequences with minimum inhibitory concentration (MIC) values as labelled data and the sequences from NCBI database without MIC values as unlabelled data to develop an approach, using only a fragment from pbp2x (750 bp) and a fragment from pbp2b (750 bp) to predicate the cefuroxime and amoxicillin resistance. We further validated the performance of the supervised learning model by constructing mutants containing the randomly selected pbps and testing more clinical strains isolated from Chinese hospital. In addition, we established the association between resistance phenotypes and serotypes and sequence type of S. pneumoniae using our approach, which facilitate the understanding of the worldwide epidemiology of S. pneumonia.

[1]  M. Kilian,et al.  Commensal Streptococci Serve as a Reservoir for β-Lactam Resistance Genes in Streptococcus pneumoniae , 2015, Antimicrobial Agents and Chemotherapy.

[2]  Thelma Sáfadi,et al.  Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data , 2017, PloS one.

[3]  S. Kaplan,et al.  Emergence of Multidrug-Resistant Pneumococcal Serotype 35B among Children in the United States , 2016, Journal of Clinical Microbiology.

[4]  Magali Jaillard,et al.  Microbial genomics and antimicrobial susceptibility testing , 2017, Expert review of molecular diagnostics.

[5]  C. Rolfo,et al.  Bacterial imbalance and gut pathologies: Association and contribution of E. coli in inflammatory bowel disease , 2018, Critical reviews in clinical laboratory sciences.

[6]  D. G. Gibson,et al.  Enzymatic Assembly of Overlapping DNA Fragments , 2011, Methods in Enzymology.

[7]  B. Murray,et al.  Antibiotic-resistant bugs in the 21st century--a clinical super-challenge. , 2009, The New England journal of medicine.

[8]  Moran Bercovici,et al.  Rapid phenotypic antimicrobial susceptibility testing using nanoliter arrays , 2017, Proceedings of the National Academy of Sciences.

[9]  I. Biswas,et al.  Shuttle expression plasmids for genetic studies in Streptococcus mutans. , 2008, Microbiology.

[10]  B. Chain,et al.  The sequence of sequencers: The history of sequencing DNA , 2016, Genomics.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Melina B. Cian,et al.  Compensatory Evolution of pbp Mutations Restores the Fitness Cost Imposed by β-Lactam Resistance in Streptococcus pneumoniae , 2011, PLoS pathogens.

[13]  Elliot J. Lefkowitz,et al.  Genome of the Bacterium Streptococcus pneumoniae Strain R6 , 2001, Journal of bacteriology.

[14]  Andrew J. Page,et al.  Roary: rapid large-scale prokaryote pan genome analysis , 2015, bioRxiv.

[15]  R. Hakenbeck,et al.  Molecular mechanisms of β-lactam resistance in Streptococcus pneumoniae. , 2012, Future microbiology.

[16]  W. Hanage,et al.  eBURST: Inferring Patterns of Evolutionary Descent among Clusters of Related Bacterial Genotypes from Multilocus Sequence Typing Data , 2004, Journal of bacteriology.

[17]  Su-In Lee,et al.  A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia , 2018, Nature Communications.

[18]  R. Ismagilov,et al.  Rapid pathogen-specific phenotypic antibiotic susceptibility testing using digital LAMP quantification in clinical samples , 2017, Science Translational Medicine.

[19]  B. Spratt,et al.  Genetics of resistance to third‐generation cephalosporins in clinical isolates of Streptococcus pneumoniae , 1992, Molecular microbiology.

[20]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[21]  C. Rolfo,et al.  Prediction of mycoplasma hominis proteins targeting in mitochondria and cytoplasm of host cells and their implication in prostate cancer etiology , 2016, Oncotarget.

[22]  P. Leprohon,et al.  Whole genome analysis of linezolid resistance in Streptococcus pneumoniae reveals resistance and compensatory mutations , 2011, BMC Genomics.

[23]  C. Whitney,et al.  Penicillin-Binding Protein Transpeptidase Signatures for Tracking and Predicting β-Lactam Resistance Levels in Streptococcus pneumoniae , 2016, mBio.

[24]  M. Domingo,et al.  Comparison of sequential multiplex PCR, sequetyping and whole genome sequencing for serotyping of Streptococcus pneumoniae , 2017, PloS one.

[25]  Shahanavaj Khan Potential role of Escherichia coli DNA mismatch repair proteins in colon cancer. , 2015, Critical reviews in oncology/hematology.

[26]  William R Taylor,et al.  Amino acid encoding schemes from protein structure alignments: multi-dimensional vectors to describe residue types. , 2002, Journal of theoretical biology.

[27]  P. Leprohon,et al.  Genomic analysis and reconstruction of cefotaxime resistance in Streptococcus pneumoniae. , 2013, The Journal of antimicrobial chemotherapy.

[28]  Gary D. Kader,et al.  Variability for Categorical Variables , 2007 .

[29]  K. Zimmermann,et al.  PSEUDO‐R2 MEASURES FOR SOME COMMON LIMITED DEPENDENT VARIABLE MODELS , 1996 .

[30]  A. Foulkes,et al.  Application of two machine learning algorithms to genetic association studies in the presence of covariates , 2008, BMC Genetics.

[31]  Alexandre P. Francisco,et al.  PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods , 2012, BMC Bioinformatics.

[32]  Marcin J. Skwark,et al.  Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis , 2016, bioRxiv.

[33]  H. Cramér Mathematical Methods of Statistics (PMS-9), Volume 9 , 1946 .

[34]  M. Segal,et al.  Relating Amino Acid Sequence to Phenotype: Analysis of Peptide‐Binding Data , 2000, Biometrics.

[35]  D. Rhee,et al.  Pneumonia and Streptococcus pneumoniae vaccine , 2017, Archives of Pharmacal Research.

[36]  Ken Dewar,et al.  Genome sequencing of linezolid-resistant Streptococcus pneumoniae mutants reveals novel mechanisms of resistance. , 2009, Genome research.

[37]  Maxime Déraspe,et al.  Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons , 2016, BMC Genomics.

[38]  L. McGee,et al.  Validation of β-lactam minimum inhibitory concentration predictions for pneumococcal isolates with newly encountered penicillin binding protein (PBP) sequences , 2017, BMC Genomics.

[39]  Mohammed Zakariah,et al.  To Decipher the Mycoplasma hominis Proteins Targeting into the Endoplasmic Reticulum and Their Implications in Prostate Cancer Etiology Using Next-Generation Sequencing Data , 2018, Molecules.

[40]  Kumiko Kondo,et al.  Complete Sequences of Six Penicillin-Binding Protein Genes from 40 Streptococcus pneumoniae Clinical Isolates Collected in Japan , 2004, Antimicrobial Agents and Chemotherapy.