The prevalence of antibiotic resistance in pathogens is far outpacing our ability to develop new antibiotics. This necessitates the development of diagnostic tests that can determine bacterial susceptibility. For Mycobacterium tuberculosis (MTB), this is particularly urgent given that current methods for testing susceptibility take up to two months. The decreasing cost and time required for whole genome sequencing (WGS) offers the possibility of using genome-wide mutational patterns in bacterial DNA to determine drug susceptibility. However, the computational framework for taking advantage of this data has not yet been developed. This paper describes a machine-learning approach for predicting bacterial susceptibility from genomic data. The presence or absence of over 500 single nucleotide polymorphisms (SNPs) found in a dataset of 652 bacterial isolates from the Oxford University Hospitals NHS Trust and elsewhere in the UK were used as features for a number of classification algorithms. Susceptibility and resistance were defined based upon phenotypic growth patterns, and the results from the proposed machine learning method were compared to predictions based upon the presence of a set of known resistance-conferring mutations. Misclassified isolates were also examined for commonalities, revealing eleven potentially new resistance-conferring mutations. The prediction of drug susceptibility using the proposed approach was very promising. Classification accuracy of 93% was obtained for predicting resistance to isoniazid, a key first-line antibiotic drug for MTB. The proposed method was capable of particularly high sensitivity, ranging between 95-100% across the four drugs examined. There is great potential to further develop this framework to find new resistance-conferring mutations.
[1]
G. Dougan,et al.
Routine Use of Microbial Whole Genome Sequencing in Diagnostic and Public Health Microbiology
,
2012,
PLoS pathogens.
[2]
M. Donald Cave,et al.
Population Genetics Study of Isoniazid Resistance Mutations and Evolution of Multidrug-Resistant Mycobacterium tuberculosis
,
2006,
Antimicrobial Agents and Chemotherapy.
[3]
C. Boehme,et al.
The Changing Landscape of Diagnostic Services for Tuberculosis
,
2013,
Seminars in Respiratory and Critical Care Medicine.
[4]
Christopher M. Bishop,et al.
Pattern Recognition and Machine Learning (Information Science and Statistics)
,
2006
.
[5]
Nasser M. Nasrabadi,et al.
Pattern Recognition and Machine Learning
,
2006,
Technometrics.
[6]
Daniel J. Wilson,et al.
Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study
,
2013,
The Lancet. Infectious diseases.
[7]
Daniel J. Wilson,et al.
Transforming clinical microbiology with bacterial genome sequencing
,
2012,
Nature Reviews Genetics.