QuantTB – a method to classify mixed Mycobacterium tuberculosis infections within whole genome sequencing data

Background Mixed infections of Mycobacterium tuberculosis, and antibiotic heteroresistance, continue to complicate tuberculosis (TB) diagnosis and treatment. Detection of mixed infections has been limited to molecular genotyping techniques, which lack the sensitivity and resolution to accurately estimate the multiplicity of TB infections. In contrast, whole genome sequencing offers sensitive views of the genetic differences between strains of M. tuberculosis within a sample. Although metagenomic tools exist to classify strains in a metagenomic sample, most tools have been developed for more divergent species, and therefore cannot provide the sensitivity required to disentangle strains within closely related bacterial species such as M. tuberculosis. Here we present QuantTB, a method to identify and quantify individual M. tuberculosis strains in whole genome sequencing data. QuantTB uses SNP markers to determine the combination of strains that best explain the allelic variation observed in a sample. QuantTB outputs a list of identified strains, their corresponding relative abundances, as well as a list of drugs for which resistance-conferring mutations (or heteroresistance) has been predicted within the sample. Results We show that QuantTB has a high degree of resolution, and is capable of differentiating communities differing by less than 25 SNPs and identifying strains down to 1× coverage. Using simulated data, we found QuantTB outperformed other metagenomic strain identification tools at detecting strains and quantifying strain multiplicity. In a real-world scenario, using a dataset of paired clinical isolates from a study of patients with either reinfections or relapses, we found that QuantTB could detect mixed infections and reinfections at rates concordant with a manually curated approach. Conclusion QuantTB can determine infection multiplicity, identify hetero-resistance patterns, enable differentiation between relapse and re-infection, and clarify transmission events across seemingly unrelated patients – even in low-coverage (1x) samples. QuantTB outperforms existing tools and promises to serve as a valuable resource for both clinicians and researchers working with clinical TB samples.

[1]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[2]  Dennis A. Benson,et al.  GenBank , 2010, Nucleic Acids Res..

[3]  Joanne R. Winter,et al.  Interpreting whole genome sequencing for investigating tuberculosis transmission: a systematic review , 2016, BMC Medicine.

[4]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[5]  P. V. van Helden,et al.  Reinfection and mixed infection cause changing Mycobacterium tuberculosis drug-resistance patterns. , 2005, American journal of respiratory and critical care medicine.

[6]  Reidar Andreson,et al.  StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees , 2017, PeerJ.

[7]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[8]  Thomas C Victor,et al.  Patients with active tuberculosis often have different strains in the same sputum specimen. , 2004, American journal of respiratory and critical care medicine.

[9]  Samuel A. Assefa,et al.  Elucidating Emergence and Transmission of Multidrug-Resistant Tuberculosis in Treatment Experienced Patients by Whole Genome Sequencing , 2013, PloS one.

[10]  B. Kana,et al.  Relapse, re-infection and mixed infections in tuberculosis disease. , 2017, Pathogens and disease.

[11]  R. Gie,et al.  Multiple Mycobacterium tuberculosis Strains in Early Cultures from Patients in a High-Incidence Community Setting , 2002, Journal of Clinical Microbiology.

[12]  Phelim Bradley,et al.  Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis , 2015, Nature Communications.

[13]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[14]  D. van Soolingen,et al.  DNA fingerprinting of Mycobacterium tuberculosis: from phage typing to whole-genome sequencing. , 2012, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[15]  Julian Parkhill,et al.  Whole-genome sequencing to establish relapse or re-infection with Mycobacterium tuberculosis: a retrospective observational study , 2013, The Lancet. Respiratory medicine.

[16]  T. Cohen,et al.  Latent Coinfection and the Maintenance of Strain Diversity , 2009, Bulletin of mathematical biology.

[17]  S. Gagneux,et al.  Consequences of genomic diversity in Mycobacterium tuberculosis. , 2014, Seminars in immunology.

[18]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[19]  S. Sampson,et al.  Mycobacterial PE/PPE Proteins at the Host-Pathogen Interface , 2011, Clinical & developmental immunology.

[20]  Davide Albanese,et al.  Strain profiling and epidemiology of bacterial species from metagenomic sequencing , 2017, Nature Communications.

[21]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[22]  D. Engelthaler,et al.  Mixed Mycobacterium tuberculosis–Strain Infections Are Associated With Poor Treatment Outcomes Among Patients With Newly Diagnosed Tuberculosis, Independent of Pretreatment Heteroresistance , 2018, The Journal of infectious diseases.

[23]  A. Fateh,et al.  Mixed infections in tuberculosis: The missing part in a puzzle. , 2017, Tuberculosis.

[24]  Jukka Corander,et al.  Bayesian identification of bacterial strains from sequencing data , 2015, Microbial genomics.

[25]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[26]  M. Möls,et al.  StrainSeeker: fast identification of bacterial strains from unassembled sequencing reads using user-provided guide trees , 2016, bioRxiv.

[27]  张锺儒,et al.  对急性哮喘加剧的成年患者两种不同教育干预的评价[英]/Co^^té J…∥Am J Respir Crit Care Med , 2002 .

[28]  T. Cohen,et al.  Mixed-Strain Mycobacterium tuberculosis Infections and the Implications for Tuberculosis Treatment and Control , 2012, Clinical Microbiology Reviews.

[29]  Phelim Bradley,et al.  Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study , 2015, The Lancet. Infectious diseases.

[30]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[31]  T. Clark,et al.  Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data , 2018, BMC Genomics.

[32]  C. Dolea,et al.  World Health Organization , 1949, International Organization.

[33]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.

[34]  Q. Gao,et al.  Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis , 2016, PloS one.

[35]  R. Warren,et al.  Accuracy of whole genome sequencing versus phenotypic (MGIT) and commercial molecular tests for detection of drug-resistant Mycobacterium tuberculosis isolated from patients in Brazil and Mozambique. , 2018, Tuberculosis.

[36]  Changjin Hong,et al.  PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples , 2014, Microbiome.

[37]  J. Klausner,et al.  Mixed Mycobacterium tuberculosis Complex Infections and False-Negative Results for Rifampin Resistance by GeneXpert MTB/RIF Are Associated with Poor Clinical Outcomes , 2014, Journal of Clinical Microbiology.

[38]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[39]  S. Harris,et al.  Using whole genome sequencing to investigate transmission in a multi-host system: bovine tuberculosis in New Zealand , 2017, BMC Genomics.

[40]  T. Clark,et al.  Recurrence due to Relapse or Reinfection With Mycobacterium tuberculosis: A Whole-Genome Sequencing Approach in a Large, Population-Based Cohort With a High HIV Infection Prevalence and Active Follow-up , 2014, The Journal of infectious diseases.

[41]  Thomas Abeel,et al.  Genomic analysis of globally diverse Mycobacterium tuberculosis strains provides insights into emergence and spread of multidrug resistance , 2017, Nature Genetics.

[42]  Dennis Andersson,et al.  A retrospective cohort study , 2018 .

[43]  Y. Long,et al.  Genotyping analysis using an RFLP assay. , 2015, Methods in molecular biology.

[44]  Chongle Pan,et al.  Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance , 2014, Bioinform..

[45]  Laura Pérez-Lago,et al.  Whole genome sequencing analysis of intrapatient microevolution in Mycobacterium tuberculosis: potential impact on the inference of tuberculosis transmission. , 2014, The Journal of infectious diseases.