Machine learning for the prediction of antibacterial susceptibility in Mycobacterium tuberculosis

The prevalence of antibiotic resistance in pathogens is far outpacing our ability to develop new antibiotics. This necessitates the development of diagnostic tests that can determine bacterial susceptibility. For Mycobacterium tuberculosis (MTB), this is particularly urgent given that current methods for testing susceptibility take up to two months. The decreasing cost and time required for whole genome sequencing (WGS) offers the possibility of using genome-wide mutational patterns in bacterial DNA to determine drug susceptibility. However, the computational framework for taking advantage of this data has not yet been developed. This paper describes a machine-learning approach for predicting bacterial susceptibility from genomic data. The presence or absence of over 500 single nucleotide polymorphisms (SNPs) found in a dataset of 652 bacterial isolates from the Oxford University Hospitals NHS Trust and elsewhere in the UK were used as features for a number of classification algorithms. Susceptibility and resistance were defined based upon phenotypic growth patterns, and the results from the proposed machine learning method were compared to predictions based upon the presence of a set of known resistance-conferring mutations. Misclassified isolates were also examined for commonalities, revealing eleven potentially new resistance-conferring mutations. The prediction of drug susceptibility using the proposed approach was very promising. Classification accuracy of 93% was obtained for predicting resistance to isoniazid, a key first-line antibiotic drug for MTB. The proposed method was capable of particularly high sensitivity, ranging between 95-100% across the four drugs examined. There is great potential to further develop this framework to find new resistance-conferring mutations.