SharpTNI: Counting and Sampling Parsimonious Transmission Networks under a Weak Bottleneck

Background Technological advances in genomic sequencing are facilitating the reconstruction of transmission histories during outbreaks in the fight against infectious diseases. However, accurate disease transmission inference using this data is hindered by a number of challenges due to within-host pathogen diversity and weak transmission bottlenecks, where multiple genetically-distinct pathogenic strains co-transmit. Results We formulate a combinatorial optimization problem for transmission network inference under a weak bottleneck from a given timed phylogeny and establish hardness results. We present SharpTNI, a method to approximately count and almost uniformly sample from the solution space. Using simulated data, we show that SharpTNI accurately quantifies and uniformly samples from the solution space of parsimonious transmission networks, scaling to large datasets. We demonstrate that SharpTNI identifies co-transmissions during the 2014 Ebola outbreak that are corroborated by epidemiological information collected by previous studies. Conclusions Accounting for weak transmission bottlenecks is crucial for accurate inference of transmission histories during outbreaks. SharpTNI is a parsimony-based method to reconstruct transmission networks for diseases with long incubation times and large inocula given timed phylogenies. The model and theoretical work of this paper pave the way for novel maximum likelihood methods to co-estimate timed phylogenies and transmission networks under a weak bottleneck.

[1]  Mohammed El-Kebir Parsimonious Migration History Problem: Complexity and Algorithms , 2018, WABI.

[2]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[3]  Vasileios Vasaitis Approximate Counting by Dynamic Programming , 2005 .

[4]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[5]  Supratik Chakraborty,et al.  A Scalable Approximate Model Counter , 2013, CP.

[6]  Mate Soos,et al.  BIRD: Engineering an Efficient CNF-XOR SAT Solver and Its Applications to Approximate Model Counting , 2019, AAAI.

[7]  Xavier Didelot,et al.  Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks , 2017, PLoS Comput. Biol..

[8]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[9]  Benjamin J. Raphael,et al.  Inferring Parsimonious Migration Histories for Metastatic Cancers , 2018, Nature Genetics.

[10]  Supratik Chakraborty,et al.  Balancing scalability and uniformity in SAT witness generator , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Bimal Kumar Roy,et al.  Counting, sampling and integrating: Algorithms and complexity , 2013 .

[12]  Mihir Bellare,et al.  Uniform Generation of NP-Witnesses Using an NP-Oracle , 2000, Inf. Comput..

[13]  Martin Eichner,et al.  Incubation Period of Ebola Hemorrhagic Virus Subtype Zaire , 2011, Osong public health and research perspectives.

[14]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[15]  Jacco Wallinga,et al.  Relating Phylogenetic Trees to Transmission Trees of Infectious Disease Outbreaks , 2013, Genetics.

[16]  J Wallinga,et al.  Unravelling transmission trees of infectious diseases by combining genetic and epidemiological data , 2012, Proceedings of the Royal Society B: Biological Sciences.

[17]  Evan S Snitkin,et al.  Tracking a Hospital Outbreak of Carbapenem-Resistant Klebsiella pneumoniae with Whole-Genome Sequencing , 2012, Science Translational Medicine.

[18]  Sebastián Duchêne,et al.  BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis , 2018, bioRxiv.

[19]  Rachel S. G. Sealfon,et al.  Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak , 2014, Science.

[20]  Robert Giegerich,et al.  Algebraic Dynamic Programming , 2002, AMAST.

[21]  Xavier Didelot,et al.  Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data , 2014, Molecular biology and evolution.

[22]  Igor Mandric,et al.  QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data , 2018, Bioinform..

[23]  L. Allen An Introduction to Stochastic Epidemic Models , 2008 .

[24]  Katia Koelle,et al.  Transmission Bottleneck Size Estimation from Pathogen Deep-Sequencing Data, with an Application to Human Influenza A Virus , 2017, Journal of Virology.

[25]  Ethan Romero-Severson,et al.  Timing and order of transmission events is not directly reflected in a pathogen phylogeny. , 2014, Molecular biology and evolution.

[26]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[27]  Sanjit A. Seshia,et al.  On Parallel Scalable Uniform SAT Witness Generation , 2015, TACAS.

[28]  M. Uhlén,et al.  Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Matthew Hall,et al.  Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set , 2014, PLoS Comput. Biol..

[30]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[31]  Y. Ponty Efficient sampling of RNA secondary structures from the Boltzmann ensemble of low-energy , 2007, Journal of mathematical biology.

[32]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[33]  Guy Baele,et al.  Phylodynamic assessment of intervention strategies for the West African Ebola virus outbreak , 2018, Nature Communications.

[34]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[35]  Nicola De Maio,et al.  SCOTTI: Efficient Reconstruction of Transmission within Outbreaks with the Structured Coalescent , 2016, PLoS Comput. Biol..

[36]  Xavier Didelot,et al.  Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks , 2016, bioRxiv.

[37]  Gaël Thébaud,et al.  Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus , 2008, Proceedings of the Royal Society B: Biological Sciences.

[38]  Nicola De Maio,et al.  Bayesian reconstruction of transmission within outbreaks using genomic variants , 2017, bioRxiv.

[39]  Julian Parkhill,et al.  Evolution of MRSA During Hospital Transmission and Intercontinental Spread , 2010, Science.

[40]  M Slatkin,et al.  A cladistic measure of gene flow inferred from the phylogenies of alleles. , 1989, Genetics.

[41]  Mohammed El-Kebir,et al.  Summarizing the solution space in tumor phylogeny inference by multiple consensus trees , 2019, Bioinform..

[42]  István Miklós,et al.  Counting and sampling SCJ small parsimony solutions , 2014, Theor. Comput. Sci..

[43]  Yann Ponty,et al.  Counting, Generating, Analyzing and Sampling Tree Alignments , 2018, Int. J. Found. Comput. Sci..

[44]  Gerd Finke,et al.  Batch processing with interval graph compatibilities between tasks , 2005, Discret. Appl. Math..

[45]  Cedric Chauve,et al.  Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models , 2020, Journal of mathematical biology.