An interpretable classification method for predicting drug resistance in M. tuberculosis

Motivation The prediction of drug resistance and the identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Modern methods based on testing against a catalogue of previously identified mutations often yield poor predictive performance. On the other hand, machine learning techniques have demonstrated high predictive accuracy, but many of them lack interpretability to aid in identifying specific mutations which lead to resistance. We propose a novel technique, inspired by the group testing problem and Boolean compressed sensing, which yields highly accurate predictions and interpretable results at the same time. Results We develop a modified version of the Boolean compressed sensing problem for identifying drug resistance, and implement its formulation as an integer linear program. This allows us to characterize the predictive accuracy of the technique and select an appropriate metric to optimize. A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of our model under different regimes. We test the predictive accuracy of our approach on a variety of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to that of standard machine learning models and points to several genes with previously identified association to drug resistance. Availability https://github.com/hoomanzabeti/TB_Resistance_RuleBasedClassifier Contact hooman_zabeti@sfu.ca

[1]  Yonina C. Eldar,et al.  Structured Compressed Sensing: From Theory to Applications , 2011, IEEE Transactions on Signal Processing.

[2]  Sorin Draghici,et al.  Predicting HIV drug resistance with neural networks , 2003, Bioinform..

[3]  George Atia,et al.  Boolean Compressed Sensing and Noisy Group Testing , 2009, IEEE Transactions on Information Theory.

[4]  M. Doebeli,et al.  Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem , 2018, Microbiome.

[5]  Mauricio O. Carneiro,et al.  Scaling accurate genetic variant discovery to tens of thousands of samples , 2017, bioRxiv.

[6]  T. Kirikae,et al.  CASTB (the comprehensive analysis server for the Mycobacterium tuberculosis complex): A publicly accessible web server for epidemiological analyses, drug-resistance prediction and phylogenetic comparison of clinical isolates. , 2015, Tuberculosis.

[7]  Houman Owhadi,et al.  A non-adapted sparse approximation of PDEs with stochastic inputs , 2010, J. Comput. Phys..

[8]  T. Clark,et al.  Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data , 2019, Front. Genet..

[9]  Arya Mazumdar,et al.  On Almost Disjunct Matrices for Group Testing , 2011, ISAAC.

[10]  I. Smith,et al.  XDR tuberculosis--implications for global public health. , 2007, The New England journal of medicine.

[11]  P. Beckert,et al.  PhyResSE: a Web Tool Delineating Mycobacterium tuberculosis Antibiotic Resistance and Lineage from Whole-Genome Sequencing Data , 2015, Journal of Clinical Microbiology.

[12]  Stefan Niemann,et al.  Mycobacterium tuberculosis resistance prediction and lineage classification from genome sequencing: comparison of automated analysis tools , 2017, Scientific Reports.

[13]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[14]  Shakuntala Baichoo,et al.  Current Affairs of Microbial Genome-Wide Association Studies: Approaches, Bottlenecks and Analytical Pitfalls , 2020, Frontiers in Microbiology.

[15]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[16]  Francesc Coll,et al.  A robust SNP barcode for typing Mycobacterium tuberculosis complex strains , 2014, Nature Communications.

[17]  Thomas Strohmer,et al.  High-Resolution Radar via Compressed Sensing , 2008, IEEE Transactions on Signal Processing.

[18]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[19]  Francesc Coll,et al.  Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences , 2015, Genome Medicine.

[20]  Kyle A. Gallivan,et al.  A compressed sensing approach for partial differential equations with random input data , 2012 .

[21]  Yik-Ying Teo,et al.  Genomic prediction of tuberculosis drug-resistance: benchmarking existing databases and prediction algorithms , 2019, BMC Bioinformatics.

[22]  Yan Zhang,et al.  PATRIC, the bacterial bioinformatics database and analysis resource , 2013, Nucleic Acids Res..

[23]  Dmitry M. Malioutov,et al.  Boolean compressed sensing: LP relaxation for group testing , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  J. Palomino,et al.  Molecular basis and mechanisms of drug resistance in Mycobacterium tuberculosis: classical and new drugs. , 2011, The Journal of antimicrobial chemotherapy.

[25]  S. Borrell,et al.  KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes , 2014, BMC Genomics.

[26]  Marco Schito,et al.  Collaborative Effort for a Centralized Worldwide Tuberculosis Relational Sequencing Data Platform. , 2015, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  S. Gagneux Ecology and evolution of Mycobacterium tuberculosis , 2018, Nature Reviews Microbiology.

[29]  R. Dorfman The Detection of Defective Members of Large Populations , 1943 .

[30]  C. Köser,et al.  Systematic review of mutations associated with resistance to the new and repurposed Mycobacterium tuberculosis drugs bedaquiline, clofazimine, linezolid, delamanid and pretomanid. , 2020, The Journal of antimicrobial chemotherapy.

[31]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[32]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[33]  Ruth McNerney,et al.  A standardised method for interpreting the association between mutations and phenotypic drug resistance in Mycobacterium tuberculosis , 2017, European Respiratory Journal.

[34]  Matthew Aldridge,et al.  Group testing: an information theory perspective , 2019, Found. Trends Commun. Inf. Theory.

[35]  David A. Clifton,et al.  Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data , 2017, Bioinform..