A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation

Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis—where individuals are sampled sooner post-infection—rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP), which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85%) and specificity (91%) than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46%) as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where it is critical to robustly and accurately identify clusters for the most cost-effective deployment of outbreak management and prevention resources.

[1]  Anne-Mieke Vandamme,et al.  Edinburgh Research Explorer Phylogenetic surveillance of viral genetic diversity and the evolving molecular epidemiology of human immunodeficiency virus type 1 , 2007 .

[2]  Michael W. Spiller,et al.  HIV Infection Linked to Injection Use of Oxymorphone in Indiana, 2014-2015. , 2016, The New England journal of medicine.

[3]  T. Stadler,et al.  Estimating shifts in diversification rates based on higher-level phylogenies , 2016, Biology Letters.

[4]  P. Harrigan,et al.  The impact of clinical, demographic and risk factors on rates of HIV transmission: a population-based phylogenetic analysis in British Columbia, Canada. , 2015, The Journal of infectious diseases.

[5]  T. Rydén An EM algorithm for estimation in Markov-modulated Poisson processes , 1996 .

[6]  Art F. Y. Poon,et al.  Near real-time monitoring of HIV transmission hotspots from routine HIV genotyping: an implementation case study , 2016, The lancet. HIV.

[7]  Bethany L. Dearlove,et al.  Biased phylodynamic inferences from analysing clusters of viral sequences , 2016, bioRxiv.

[8]  Wolfgang Fischer,et al.  The Markov-Modulated Poisson Process (MMPP) Cookbook , 1993, Perform. Evaluation.

[9]  R. Shamir,et al.  A fast algorithm for joint reconstruction of ancestral amino acid sequences. , 2000, Molecular biology and evolution.

[10]  Thomas Mailund,et al.  Rapid Neighbour-Joining , 2008, WABI.

[11]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[12]  J P Bru,et al.  Acute HIV infection: impact on the spread of HIV and transmission of drug resistance , 2001, AIDS.

[13]  P H Harvey,et al.  Revealing the history of infectious disease epidemics through phylogenetic trees. , 1995, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[14]  Erik M. Volz,et al.  Simple Epidemiological Dynamics Explain Phylogenetic Clustering of HIV from Patients with Recent Infection , 2012, PLoS Comput. Biol..

[15]  Ann M. Dennis,et al.  Characterizing HIV transmission networks across the United States. , 2012, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[16]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[17]  WolfElizabeth,et al.  Short Communication: Phylogenetic Evidence of HIV-1 Transmission Between Adult and Adolescent Men Who Have Sex with Men. , 2016 .

[18]  B. Roizman,et al.  Restriction endonuclease fingerprinting of herpes simplex virus DNA: a novel epidemiological tool applied to a nosocomial outbreak. , 1978, The Journal of infectious diseases.

[19]  P. Lemey,et al.  HIV evolutionary dynamics within and among hosts. , 2006, AIDS reviews.

[20]  Rachel S. G. Sealfon,et al.  Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak , 2014, Science.

[21]  A. Poon Impacts and shortcomings of genetic clustering methods for infectious disease outbreaks , 2016, Virus evolution.

[22]  Jacco Wallinga,et al.  Relating Phylogenetic Trees to Transmission Trees of Infectious Disease Outbreaks , 2013, Genetics.

[23]  Maurizio Zazzi,et al.  A novel methodology for large-scale phylogeny partition , 2011, Nature communications.

[24]  Tanja Stadler,et al.  Inferring Epidemiological Dynamics with Bayesian Coalescent Inference: The Merits of Deterministic and Stochastic Models , 2014, Genetics.

[25]  Stéphane Hué,et al.  HIV-1 pol gene variation is sufficient for reconstruction of transmissions in the era of antiretroviral therapy , 2004, AIDS.

[26]  M. Uhlén,et al.  Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Michel Roger,et al.  Transmission clustering drives the onward spread of the HIV epidemic among men who have sex with men in Quebec. , 2011, The Journal of infectious diseases.

[28]  Colin J. Worby,et al.  Within-Host Bacterial Diversity Hinders Accurate Reconstruction of Transmission Networks from Genomic Distance Data , 2014, PLoS Comput. Biol..

[29]  Trevor Bedford,et al.  Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2017, Nature.

[30]  Daniel J. Wilson,et al.  Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study , 2013, The Lancet. Infectious diseases.

[31]  Matthias Cavassini,et al.  Molecular epidemiology reveals long-term changes in HIV type 1 subtype B transmission in Switzerland. , 2010, The Journal of infectious diseases.

[32]  W. Bruno,et al.  Performance of a divergence time estimation method under a probabilistic model of rate evolution. , 2001, Molecular biology and evolution.

[33]  Peter E Midford,et al.  Estimating a binary character's effect on speciation and extinction. , 2007, Systematic biology.

[34]  Soo-Yon Rhee,et al.  HIV-1 protease and reverse transcriptase mutations for drug resistance surveillance , 2007, AIDS.

[35]  P. Simmonds,et al.  Concurrent evolution of human immunodeficiency virus type 1 in patients infected from the same source , 2022 .

[36]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[37]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[38]  J. Parkhill,et al.  Large-scale whole genome sequencing of M. tuberculosis provides insights into transmission in a high prevalence area , 2015, eLife.

[39]  Rebecca R. Gray,et al.  The mode and tempo of hepatitis C virus evolution within and among hosts , 2011, BMC Evolutionary Biology.

[40]  Sergei L. Kosakovsky Pond,et al.  An Evolutionary Model-Based Algorithm for Accurate Phylogenetic Breakpoint Mapping and Subtype Prediction in HIV-1 , 2009, PLoS Comput. Biol..

[41]  Samantha Lycett,et al.  Automated analysis of phylogenetic clusters , 2013, BMC Bioinformatics.

[42]  A. Rambaut,et al.  Episodic Sexual Transmission of HIV Revealed by Molecular Phylodynamics , 2008, PLoS medicine.

[43]  David S. Campo,et al.  Accurate Genetic Detection of Hepatitis C Virus Transmissions in Outbreak Settings. , 2016, The Journal of infectious diseases.

[44]  Jacco Wallinga,et al.  Finding Evidence for Local Transmission of Contagious Disease in Molecular Epidemiological Datasets , 2013, PloS one.

[45]  Andrew Rambaut,et al.  Real-time digital pathogen surveillance — the time is now , 2015, Genome Biology.

[46]  Ziheng Yang,et al.  INDELible: A Flexible Simulator of Biological Sequence Evolution , 2009, Molecular biology and evolution.

[47]  FischerWolfgang,et al.  The Markov-modulated Poisson process (MMPP) cookbook , 1993 .

[48]  Klaus Peter Schliep,et al.  phangorn: phylogenetic analysis in R , 2010, Bioinform..

[49]  Beda Joos,et al.  Estimating the basic reproductive number from viral sequence data. , 2012, Molecular biology and evolution.

[50]  Brandon D. L. Marshall,et al.  Phylogenetic clustering of hepatitis C virus among people who inject drugs in Vancouver, Canada , 2014, Hepatology.

[51]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[52]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[53]  Sikhulile Moyo,et al.  Impact of sampling density on the extent of HIV clustering. , 2014, AIDS research and human retroviruses.

[54]  Garry Robins,et al.  Hepatitis C Virus Phylogenetic Clustering Is Associated with the Social-Injecting Network in a Cohort of People Who Inject Drugs , 2012, PloS one.

[55]  Alexei J. Drummond,et al.  A Stochastic Simulator of Birth–Death Master Equations with Application to Phylodynamics , 2013, Molecular biology and evolution.

[56]  Sergei L. Kosakovsky Pond,et al.  The global transmission network of HIV-1. , 2014, The Journal of infectious diseases.

[57]  R. FitzJohn Diversitree: comparative phylogenetic analyses of diversification in R , 2012 .

[58]  Sebastián Duchêne,et al.  Simulating and detecting autocorrelation of molecular evolutionary rates among lineages , 2015, Molecular ecology resources.