On the Identifiability of Transmission Dynamic Models for Infectious Diseases

Understanding the transmission dynamics of infectious diseases is important for both biological research and public health applications. It has been widely demonstrated that statistical modeling provides a firm basis for inferring relevant epidemiological quantities from incidence and molecular data. However, the complexity of transmission dynamic models presents two challenges: (1) the likelihood function of the models is generally not computable, and computationally intensive simulation-based inference methods need to be employed, and (2) the model may not be fully identifiable from the available data. While the first difficulty can be tackled by computational and algorithmic advances, the second obstacle is more fundamental. Identifiability issues may lead to inferences that are driven more by prior assumptions than by the data themselves. We consider a popular and relatively simple yet analytically intractable model for the spread of tuberculosis based on classical IS6110 fingerprinting data. We report on the identifiability of the model, also presenting some methodological advances regarding the inference. Using likelihood approximations, we show that the reproductive value cannot be identified from the data available and that the posterior distributions obtained in previous work have likely been substantially dominated by the assumed prior distribution. Further, we show that the inferences are influenced by the assumed infectious population size, which generally has been kept fixed in previous work. We demonstrate that the infectious population size can be inferred if the remaining epidemiological parameters are already known with sufficient precision.

[1]  Tanja Stadler,et al.  Exact vs. Approximate Computation: Reconciling Different Estimates of Mycobacterium tuberculosis Epidemiological Parameters , 2014, Genetics.

[2]  Michael U. Gutmann,et al.  Bayesian Optimization for Likelihood-Free Inference of Simulator-Based Statistical Models , 2015, J. Mach. Learn. Res..

[3]  M. Gutmann,et al.  Statistical Inference of Intractable Generative Models via Classification , 2014 .

[4]  J. Møller Discussion on the paper by Feranhead and Prangle , 2012 .

[5]  Arnaud Doucet,et al.  An adaptive sequential Monte Carlo method for approximate Bayesian computation , 2011, Statistics and Computing.

[6]  G. Schoolnik,et al.  The epidemiology of tuberculosis in San Francisco. A population-based study using conventional and molecular methods. , 1994, The New England journal of medicine.

[7]  Ritabrata Dutta,et al.  Likelihood-free inference via classification , 2014, Stat. Comput..

[8]  Rachid Ouifki,et al.  Modeling the joint epidemics of TB and HIV in a South African township , 2008, Journal of mathematical biology.

[9]  Christian P. Robert,et al.  Bayesian computation: a summary of the current state, and samples backwards and forwards , 2015, Statistics and Computing.

[10]  Paul Fearnhead,et al.  Constructing summary statistics for approximate Bayesian computation: semi‐automatic approximate Bayesian computation , 2012 .

[11]  Hans R. Künsch,et al.  A simulated annealing approach to approximate Bayes computations , 2012, Statistics and Computing.

[12]  Denys Pommeret,et al.  Likelihood-free parallel tempering , 2011, Stat. Comput..

[13]  T. Stadler Inferring Epidemiological Parameters on the Basis of Allele Frequencies , 2011, Genetics.

[14]  S. Sisson,et al.  Diagnostic tools for approximate Bayesian computation using the coverage property , 2013, 1301.3166.

[15]  M. Blum Approximate Bayesian Computation: A Nonparametric Perspective , 2009, 0904.0635.

[16]  P. Donnelly,et al.  Inferring coalescence times from DNA sequence data. , 1997, Genetics.

[17]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.

[18]  Maya R. Gupta,et al.  Functional Bregman Divergence and Bayesian Estimation of Distributions , 2006, IEEE Transactions on Information Theory.

[19]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[20]  Jean-Marie Cornuet,et al.  Inference on population history and model checking using DNA sequence and microsatellite data with the software DIYABC (v1.0) , 2010, BMC Bioinformatics.

[21]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[22]  L. Excoffier,et al.  Efficient Approximate Bayesian Computation Coupled With Markov Chain Monte Carlo Without Likelihood , 2009, Genetics.

[23]  David Welch,et al.  Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems , 2009, Journal of The Royal Society Interface.

[24]  Andrew R. Francis,et al.  Using Approximate Bayesian Computation to Estimate Tuberculosis Transmission Parameters From Genotype Data , 2006, Genetics.

[25]  Jukka Corander,et al.  On the identifiability of transmission dynamic models for infectious diseases , 2015 .

[26]  Junichiro Hirayama,et al.  Bregman divergence as general framework to estimate unnormalized statistical models , 2011, UAI.

[27]  Mark M. Tanaka,et al.  Sequential Monte Carlo without likelihoods , 2007, Proceedings of the National Academy of Sciences.

[28]  Joseph Fourier,et al.  Approximate Bayesian Computation: a non-parametric perspective , 2013 .