Predicting drug resistance in M. tuberculosis using a long-term recurrent convolutional network

Motivation: Drug resistance in Mycobacterium tuberculosis (MTB) is a growing threat to human health worldwide. One way to mitigate the risk of drug resistance is to enable clinicians to prescribe the right antibiotic drugs to each patient through methods that predict drug resistance in MTB using whole-genome sequencing (WGS) data. Existing machine learning methods for this task typically convert the WGS data from a given bacterial isolate into features corresponding to single-nucleotide polymorphisms (SNPs) or short sequence segments of a fixed length K (K-mers). Here, we introduce a gene burden-based method for predicting drug resistance in TB. We define one numerical feature per gene corresponding to the number of mutations in that gene in a given isolate. This representation greatly reduces the number of model parameters. We further propose a model architecture that considers both gene order and locality structure through a Long-term Recurrent Convolutional Network (LRCN) architecture, which combines convolutional and recurrent layers. Results: We find that using these strategies yields a substantial, statistically significant improvement over state-of-the-art methods on a large dataset of M. tuberculosis isolates, and suggest that this improvement is driven by our method's ability to account for the order of the genes in the genome and their organization into operons. Availability: The implementations of our feature preprocessing pipeline1 and our LRCN model2 are publicly available, as is our complete dataset3. Supplementary information: Additional data are available in the Supplementary Materials document4.

[1]  P. Beckert,et al.  PhyResSE: a Web Tool Delineating Mycobacterium tuberculosis Antibiotic Resistance and Lineage from Whole-Genome Sequencing Data , 2015, Journal of Clinical Microbiology.

[2]  T. Kirikae,et al.  CASTB (the comprehensive analysis server for the Mycobacterium tuberculosis complex): A publicly accessible web server for epidemiological analyses, drug-resistance prediction and phylogenetic comparison of clinical isolates. , 2015, Tuberculosis.

[3]  D. Clifton,et al.  Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking , 2020, Frontiers in Microbiology.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Francesc Coll,et al.  A robust SNP barcode for typing Mycobacterium tuberculosis complex strains , 2014, Nature Communications.

[6]  François Laviolette,et al.  Interpretable genotype-to-phenotype classifiers with performance guarantees , 2018, Scientific Reports.

[7]  Marco Schito,et al.  Collaborative Effort for a Centralized Worldwide Tuberculosis Relational Sequencing Data Platform. , 2015, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[8]  L. Gabbasova,et al.  Global tuberculosis report (2014) , 2014 .

[9]  K. Reither,et al.  Evolution of Drug Resistance in Tuberculosis: Recent Progress and Implications for Diagnosis and Therapy , 2014, Drugs.

[10]  Stefan Niemann,et al.  Mycobacterium tuberculosis resistance prediction and lineage classification from genome sequencing: comparison of automated analysis tools , 2017, Scientific Reports.

[11]  Geographic heterogeneity impacts drug resistance predictions in Mycobacterium tuberculosis , 2020 .

[12]  Phelim Bradley,et al.  Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis , 2015, Nature Communications.

[13]  Timothy D. Read,et al.  Genome-Based Prediction of Bacterial Antibiotic Resistance , 2018, Journal of Clinical Microbiology.

[14]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[15]  Mauricio O. Carneiro,et al.  Scaling accurate genetic variant discovery to tens of thousands of samples , 2017, bioRxiv.

[16]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[17]  Jesús Martínez del Rincón,et al.  Recurrent Convolutional Network for Video-Based Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  C. Köser,et al.  Systematic review of mutations associated with resistance to the new and repurposed Mycobacterium tuberculosis drugs bedaquiline, clofazimine, linezolid, delamanid and pretomanid. , 2020, The Journal of antimicrobial chemotherapy.

[19]  T. Clark,et al.  Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data , 2019, Front. Genet..

[20]  Nick Dexter,et al.  An Interpretable Classification Method for Predicting Drug Resistance in M. Tuberculosis , 2020, WABI.

[21]  I. Smith,et al.  XDR tuberculosis--implications for global public health. , 2007, The New England journal of medicine.

[22]  S. Borrell,et al.  KvarQ: targeted and direct variant calling from fastq reads of bacterial genomes , 2014, BMC Genomics.

[23]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[24]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[25]  David A. Clifton,et al.  DeepAMR for predicting co-occurrent resistance of Mycobacterium tuberculosis , 2019, Bioinform..

[26]  Yan Zhang,et al.  PATRIC, the bacterial bioinformatics database and analysis resource , 2013, Nucleic Acids Res..

[27]  David A. Clifton,et al.  Application of machine learning techniques to tuberculosis drug resistance analysis , 2018, Bioinform..

[28]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[29]  Yik-Ying Teo,et al.  Genomic prediction of tuberculosis drug-resistance: benchmarking existing databases and prediction algorithms , 2019, BMC Bioinformatics.

[30]  M. D. Granado,et al.  WHO guidelines for the programmatic management of drug-resistant tuberculosis: 2011 update , 2011, European Respiratory Journal.

[31]  Thomas Abeel,et al.  Genomic and functional analyses of Mycobacterium tuberculosis strains implicate ald in D-cycloserine resistance , 2016, Nature Genetics.

[32]  I. Kohane,et al.  Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction , 2019, EBioMedicine.

[33]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  T. Walker,et al.  Prediction of pyrazinamide resistance in Mycobacterium tuberculosis using structure-based machine-learning approaches , 2019, bioRxiv.

[35]  S. Cole,et al.  The MycoBrowser portal: a comprehensive and manually annotated resource for mycobacterial genomes. , 2011, Tuberculosis.

[36]  Matthew W. Snyder,et al.  GWAS for quantitative resistance phenotypes in Mycobacterium tuberculosis reveals resistance genes and regulatory regions , 2019, Nature Communications.

[37]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  G. Davies,et al.  Fluoroquinolones for treating tuberculosis (presumed drug-sensitive). , 2013, The Cochrane database of systematic reviews.

[39]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[40]  Hristo S. Paskov,et al.  Multitask learning improves prediction of cancer drug sensitivity , 2016, Scientific Reports.