Computational pan-genome mapping and pairwise SNP-distance improve detection of Mycobacterium tuberculosis transmission clusters

Next-generation sequencing based base-by-base distance measures have become an integral complement to epidemiological investigation of infectious disease outbreaks. This study introduces PANPASCO, a computational pan-genome mapping based, pairwise distance method that is highly sensitive to differences between cases, even when located in regions of lineage specific reference genomes. We show that our approach is superior to previously published methods in several datasets and across different Mycobacterium tuberculosis lineages, as its characteristics allow the comparison of a high number of diverse samples in one analysis - a scenario that becomes more and more likely with the increased usage of whole-genome sequencing in transmission surveillance. Author summary Tuberculosis still is a threat to global health. It is essential to detect and interrupt transmissions to stop the spread of this infectious disease. With the rising use of next-generation sequencing methods, its application in the surveillance of Mycobacterium tuberculosis has become increasingly important in the last years. The main goal of molecular surveillance is the identification of patient-patient transmission and cluster detection. The mutation rate of M. tuberculosis is very low and stable. Therefore, many existing methods for comparative analysis of isolates provide inadequate results since their resolution is too limited. There is a need for a method that takes every detectable difference into account. We developed PANPASCO, a novel approach for comparing pairs of isolates using all genomic information available for each pair. We combine improved SNP-distance calculation with the use of a pan-genome incorporating more than 100 M. tuberculosis reference genomes for read mapping prior to variant detection. We thereby enable the collective analysis and comparison of similar and diverse isolates associated with different M. tuberculosis strains.

[1]  Steven J. M. Jones,et al.  Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. , 2011, The New England journal of medicine.

[2]  Thomas R Rogers,et al.  A cluster of multidrug-resistant Mycobacterium tuberculosis among patients arriving in Europe from the Horn of Africa: a molecular epidemiological study , 2018, The Lancet. Infectious diseases.

[3]  Falk Hildebrand,et al.  Origin, Spread and Demography of the Mycobacterium tuberculosis Complex , 2008, PLoS pathogens.

[4]  M. Struelens,et al.  From molecular to genomic epidemiology: transforming surveillance and control of infectious diseases. , 2013, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[5]  Vitali Sintchenko,et al.  Whole Genome Sequencing Demonstrates Limited Transmission within Identified Mycobacterium tuberculosis Clusters in New South Wales, Australia , 2016, PloS one.

[6]  Tanja Stadler,et al.  The relationship between transmission time and clustering methods in Mycobacterium tuberculosis epidemiology , 2018 .

[7]  J. Galagan,et al.  Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved , 2010, Nature Genetics.

[8]  Shamsudheen Karuthedath Vellarikkal,et al.  Comparative Whole-Genome Analysis of Clinical Isolates Reveals Characteristic Architecture of Mycobacterium tuberculosis Pangenome , 2015, PloS one.

[9]  Bernhard Y. Renard,et al.  seq-seq-pan: building a computational pan-genome data structure on whole genome alignment , 2017, bioRxiv.

[10]  Qian Gao,et al.  Transmission of multidrug-resistant Mycobacterium tuberculosis in Shanghai, China: a retrospective observational study using whole-genome sequencing and epidemiological investigation. , 2017, The Lancet. Infectious diseases.

[11]  Stefan Niemann,et al.  Harmonized Genome Wide Typing of Tubercle Bacilli Using a Web-Based Gene-By-Gene Nomenclature System , 2018, EBioMedicine.

[12]  Adamandia Kapopoulou,et al.  TubercuList--10 years after. , 2011, Tuberculosis.

[13]  Stefan Niemann,et al.  Whole Genome Sequencing versus Traditional Genotyping for Investigation of a Mycobacterium tuberculosis Outbreak: A Longitudinal Molecular Epidemiological Study , 2013, PLoS medicine.

[14]  Christopher Gilpin,et al.  WHO's new End TB Strategy , 2015, The Lancet.

[15]  Laura Pérez-Lago,et al.  Whole genome sequencing analysis of intrapatient microevolution in Mycobacterium tuberculosis: potential impact on the inference of tuberculosis transmission. , 2014, The Journal of infectious diseases.

[16]  Karina Yusim,et al.  Mycobacterium tuberculosis--heterogeneity revealed through whole genome sequencing. , 2012, Tuberculosis.

[17]  Francesc Coll,et al.  A robust SNP barcode for typing Mycobacterium tuberculosis complex strains , 2014, Nature Communications.

[18]  Ravishankar K. Iyer,et al.  Simulating Next-Generation Sequencing Datasets from Empirical Mutation and Sequencing Models , 2016, PloS one.

[19]  Lisa J. Murray,et al.  Genomic Diversity among Drug Sensitive and Multidrug Resistant Isolates of Mycobacterium tuberculosis with Identical DNA Fingerprints , 2009, PloS one.

[20]  Ying Zhang,et al.  Computational pan-genomics: status, promises and challenges , 2016, bioRxiv.

[21]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[22]  Oriol Mazariegos-Canellas,et al.  BugMat and FindNeighbour: command line and server applications for investigating bacterial relatedness , 2017, BMC Bioinformatics.

[23]  Z LEVNTAL,et al.  [History of tuberculosis]. , 1957, Medicinski glasnik.

[24]  J. Parkhill,et al.  Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area , 2015, eLife.

[25]  Joanne R. Winter,et al.  Interpreting whole genome sequencing for investigating tuberculosis transmission: a systematic review , 2016, BMC Medicine.

[26]  Ted Cohen,et al.  Beyond the SNP threshold: identifying outbreak clusters using inferred transmissions , 2018, bioRxiv.

[27]  Laura D. Kramer,et al.  Temperature, Viral Genetics, and the Transmission of West Nile Virus by Culex pipiens Mosquitoes , 2008, PLoS pathogens.

[28]  Derrick W. Crook,et al.  A Quantitative Evaluation of MIRU-VNTR Typing Against Whole-Genome Sequencing for Identifying Mycobacterium tuberculosis Transmission: A Prospective Observational Cohort Study , 2018, bioRxiv.

[29]  Veli Mäkinen,et al.  Towards pan-genome read alignment to improve variation calling , 2018, BMC Genomics.

[30]  Julian Parkhill,et al.  Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data , 2013, BMC Infectious Diseases.

[31]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[32]  Leen Rigouts,et al.  Mycobacterium tuberculosis complex genetic diversity: mining the fourth international spoligotyping database (SpolDB4) for classification, population genetics and epidemiology , 2006, BMC Microbiology.

[33]  Thomas R Rogers,et al.  Rapid, comprehensive, and affordable mycobacterial diagnosis with whole-genome sequencing: a prospective study , 2016, The Lancet. Respiratory medicine.

[34]  Marcel A. Behr,et al.  Does Choice Matter? Reference-Based Alignment for Molecular Epidemiology of Tuberculosis , 2016, Journal of Clinical Microbiology.

[35]  Brenna M Henn,et al.  IMPUTOR: Phylogenetically Aware Software for Imputation of Errors in Next-Generation Sequencing , 2018, Genome biology and evolution.

[36]  William Jones,et al.  Variation graph toolkit improves read mapping by representing genetic variation in the reference , 2018, Nature Biotechnology.

[37]  Nader Pourmand,et al.  Use of Whole Genome Sequencing to Determine the Microevolution of Mycobacterium tuberculosis during an Outbreak , 2013, PloS one.

[38]  M. Chase,et al.  Use of whole genome sequencing to estimate the mutation rate of Mycobacterium tuberculosis during latent infection , 2011, Nature Genetics.

[39]  Gil McVean,et al.  High-throughput microbial population genomics using the Cortex variation assembler , 2012, Bioinform..

[40]  熊礼宽,et al.  Mycobacterium , 1977, Bacteriological reviews.

[41]  Phelim Bradley,et al.  Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study , 2015, The Lancet. Infectious diseases.

[42]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[43]  Stefan Niemann,et al.  Whole genome sequencing of Mycobacterium tuberculosis for detection of recent transmission and tracing outbreaks: A systematic review. , 2016, Tuberculosis.

[44]  Stefan Niemann,et al.  Whole-Genome-Based Mycobacterium tuberculosis Surveillance: a Standardized, Portable, and Expandable Approach , 2014, Journal of Clinical Microbiology.

[45]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[46]  John Frith,et al.  History of tuberculosis. Part 1 - phthisis, consumption and the white plague , 2014 .

[47]  Neena Goveas,et al.  CySpanningTree : Minimal Spanning Tree computation in Cytoscape , 2015 .

[48]  Midori Kato-Maeda,et al.  Genotyping of Mycobacterium tuberculosis: application in epidemiologic studies. , 2011, Future microbiology.

[49]  The Computational Pan-Genomics Consortium,et al.  Computational pan-genomics: status, promises and challenges , 2018, Briefings Bioinform..

[50]  Jennifer L. Guthrie,et al.  Marked Microevolution of a Unique Mycobacterium tuberculosis Strain in 17 Years of Ongoing Transmission in a High Risk Population , 2014, PloS one.

[51]  Stefan Niemann,et al.  A joint cross-border investigation of a cluster of multidrug-resistant tuberculosis in Austria, Romania and Germany in 2014 using classic, genotyping and whole genome sequencing methods: lessons learnt , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[52]  Tanja Stadler,et al.  The relationship between transmission time and clustering methods in Mycobacterium tuberculosis epidemiology , 2018, bioRxiv.

[53]  Hansjakob Furrer,et al.  Standard Genotyping Overestimates Transmission of Mycobacterium tuberculosis among Immigrants in a Low-Incidence Country , 2016, Journal of Clinical Microbiology.

[54]  Jukka Corander,et al.  Evolution and transmission of drug resistant tuberculosis in a Russian population , 2014, Nature Genetics.

[55]  R. Evans European Centre for Disease Prevention and Control. , 2014, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[56]  J. Bray,et al.  MLST revisited: the gene-by-gene approach to bacterial genomics , 2013, Nature Reviews Microbiology.

[57]  Nalin Rastogi,et al.  Evolutionary history and global spread of the Mycobacterium tuberculosis Beijing lineage , 2015, Nature Genetics.

[58]  Jason Hinds,et al.  Clinical Application of Whole-Genome Sequencing To Inform Treatment for Multidrug-Resistant Tuberculosis Cases , 2015, Journal of Clinical Microbiology.

[59]  Daniel J. Wilson,et al.  Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study , 2013, The Lancet. Infectious diseases.