The Limits to Learning a Diffusion Model

This paper provides the first sample complexity lower bounds for the estimation of simple diffusion models which seek to explain the diffusion of an epidemic in a network. The Susceptible-Infected-Recovered (SIR) model is a classic example, proposed nearly a century ago [2]. The SIR model remains a cornerstone for the forecasting of epidemics. The so-called Bass model [1] remains a basic building block in forecasting consumer adoption of new products and services. The durability of these models arises from the fact that they have shown an excellent fit to data, in numerous studies spanning both the epidemiology and marketing literatures. Somewhat paradoxically, using these same models as reliable forecasting tools presents a challenge. While we are ultimately motivated by the problem of forecasting a diffusion model, this paper asks a more basic question that is surprisingly unanswered: What are the limits to learning a diffusion model? We answer this question by characterizing sample complexity lower bounds for a class of stochastic diffusion models that encompass both the Bass model and the SIR model. We show that the time to collect a number of observations that exceeds these lower bounds is too large to allow for accurate forecasts early in the process. In the context of the Bass model our results imply that when adoption is driven by imitation, one cannot hope to predict the eventual number of adopting customers until one is at least two-thirds of the way to the time at which the rate of new adopters is at its peak. In a similar vein, our results imply that in the case of an SIR model, one cannot hope to predict the eventual number of infections until one is approximately two-thirds of the way to the time at which the infection rate has peaked. Our analysis is conceptually simple and relies on the Cramer-Rao bound. The core technical difficulty in our analysis rests in characterizing the Fisher information in the observations available due to the fact that they have a non-trivial correlation structure. Maximum likelihood estimation of diffusion models on product adoption datasets (for products on Amazon.com), and epidemic data (from the ongoing COVID-19 epidemic) illustrate precisely the behavior predicted by our theory. As a byproduct of our analysis, we see that the difficulty in learning a diffusion model stems solely from uncertainty in a single unknown 'effective population size' parameter. In particular, other parameters, including those related to the 'rate of imitation' (in the Bass model) or the 'reproduction number' (in the SIR model) are easy to learn. This suggests that estimators that rely on an (informative) bias in this population size parameter can in fact overcome the limitations presented by our analysis. Although not a primary contribution of the present work, we describe a heuristic procedure used to construct such a biased estimator that yielded one of the first US county-level forecasters available for COVID-19. The full paper is available at https://arxiv.org/abs/2006.06373.

[1]  Shaghayegh Haghjooy Javanmard,et al.  Inefficiency of SIR models in forecasting COVID-19 epidemic: a case study of Isfahan , 2021, Scientific Reports.

[2]  C. Campèse,et al.  Underdetection of cases of COVID-19 in France threatens epidemic control , 2020, Nature.

[3]  Ihme COVID-19 Forecasting Team Modeling COVID-19 scenarios for the United States , 2020, Nature medicine.

[4]  George Turabelidze,et al.  Seroprevalence of Antibodies to SARS-CoV-2 in 10 Sites in the United States, March 23-May 12, 2020. , 2020, JAMA internal medicine.

[5]  H. Ichii,et al.  Evaluating the massive underreporting and undertesting of COVID-19 cases in multiple global epicenters , 2020, Pulmonology.

[6]  M. R. Ferrández,et al.  Mathematical modeling of the spread of the coronavirus disease 2019 (COVID-19) taking into account the undetected infections. The case of China , 2020, Communications in Nonlinear Science and Numerical Simulation.

[7]  Rajesh Sharma,et al.  Mobility Based SIR Model For Pandemics - With Case Study Of COVID-19 , 2020, 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[8]  J. Cuesta,et al.  Predictability: Can the turning point and end of an expanding epidemic be precisely forecast? , 2020, 2004.08842.

[9]  J. Ioannidis,et al.  COVID-19 antibody seroprevalence in Santa Clara County, California , 2020, medRxiv.

[10]  Elisa Franco,et al.  The challenges of modeling and forecasting the spread of COVID-19 , 2020, Proceedings of the National Academy of Sciences.

[11]  Wesley Pegden,et al.  Modeling strict age-targeted mitigation strategies for COVID-19 , 2020, PloS one.

[12]  Giuseppe C. Calafiore,et al.  A Modified SIR Model for the COVID-19 Contagion in Italy , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[13]  Franco Blanchini,et al.  Modelling the COVID-19 epidemic and implementation of population-wide interventions in Italy , 2020, Nature Medicine.

[14]  G. Gaeta A simple SIR model with a large set of asymptomatic infectives , 2020, Mathematics in Engineering.

[15]  Fairoza Amira Hamzah,et al.  CoronaTracker: World-wide COVID-19 Outbreak Data Analysis and Prediction , 2020 .

[16]  Jonathan Le Roux,et al.  COVID-19: Forecasting short term hospital needs in France , 2020, medRxiv.

[17]  P. Sen,et al.  Covid-19 spread: Reproduction of data and prediction using a SIR model on Euclidean network. , 2020, 2003.07063.

[18]  Ruiyun Li,et al.  Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) , 2020, Science.

[19]  E. Dong,et al.  An interactive web-based dashboard to track COVID-19 in real time , 2020, The Lancet Infectious Diseases.

[20]  C. Anastassopoulou,et al.  Data-based analysis, modelling and forecasting of the COVID-19 outbreak , 2020, medRxiv.

[21]  Yang Liu,et al.  Early dynamics of transmission and control of COVID-19: a mathematical modelling study , 2020, The Lancet Infectious Diseases.

[22]  G. Leung,et al.  Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study , 2020, The Lancet.

[23]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[24]  Gerardo Chowell,et al.  Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models , 2019, Theoretical Biology and Medical Modelling.

[25]  S. Janson Tail bounds for sums of geometric and exponential variables , 2017, 1709.08157.

[26]  Joel C. Miller,et al.  Mathematical models of SIR disease spread with combined non-sexual and sexual transmission routes , 2016, bioRxiv.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Pilsung Kang,et al.  Pre-launch new product demand forecasting using the Bass model: : A statistical and machine learning-based approach , 2014 .

[29]  Nakul Chitnis,et al.  Mathematical models of contact patterns between age groups for predicting the spread of infectious diseases. , 2013, Mathematical biosciences and engineering : MBE.

[30]  Joel C. Miller,et al.  A Note on the Derivation of Epidemic Final Sizes , 2012, Bulletin of mathematical biology.

[31]  Jhagvaral Hasbold,et al.  Activation-Induced B Cell Fates Are Selected by Intracellular Stochastic Competition , 2012, Science.

[32]  Duncan J. Watts,et al.  Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[33]  Yonina C. Eldar Rethinking Biased Estimation: Improving Maximum Likelihood and the Cramér-Rao Bound , 2008, Found. Trends Signal Process..

[34]  J. Norris,et al.  Differential equation approximations for Markov chains , 2007, 0710.3269.

[35]  Herbert W. Hethcote,et al.  Mixing patterns between age groups in social networks , 2007, Soc. Networks.

[36]  Mark A. Miller,et al.  Seasonal influenza in the United States, France, and Australia: transmission and prospects for control , 2007, Epidemiology and Infection.

[37]  Yogesh V. Joshi,et al.  New Product Diffusion with Influentials and Imitators , 2007 .

[38]  P. Hodgkin,et al.  A model of immune regulation as a consequence of randomized lymphocyte division and death times , 2007, Proceedings of the National Academy of Sciences.

[39]  G. Tellis,et al.  Research on Innovation: A Review and Agenda for Marketing Science , 2006 .

[40]  Jean Jacod,et al.  The approximate Euler method for Lévy driven stochastic differential equations , 2005 .

[41]  M. J. Chapman,et al.  The structural identifiability of the susceptible infected recovered model with seasonal forcing. , 2005, Mathematical biosciences.

[42]  Frank M. Bass,et al.  A New Product Growth for Model Consumer Durables , 2004, Manag. Sci..

[43]  Frank M. Bass,et al.  Comments on "A New Product Growth for Model Consumer Durables The Bass Model" , 2004, Manag. Sci..

[44]  Philip Hans Franses,et al.  The Econometrics Of The Bass Diffusion Model , 2002 .

[45]  Peter S. Fader,et al.  An Empirical Comparison of New Product Trial Forecasting Models , 1998 .

[46]  G. Lilien,et al.  Bias and Systematic Change in the Parameter Estimates of Macro-Level Diffusion Models , 1997 .

[47]  N. Wormald Differential Equations for Random Processes and Random Graphs , 1995 .

[48]  V. Mahajan,et al.  Diffusion of New Products: Empirical Generalizations and Managerial Uses , 1995 .

[49]  Dipak C. Jain,et al.  Why the Bass Model Fits without Decision Variables , 1994 .

[50]  D. Jain,et al.  Effect of Price on the Demand for Durables: Modeling, Estimation, and Findings , 1990 .

[51]  Donald R. Lehmann,et al.  A Meta-Analysis of Applications of Diffusion Models , 1990 .

[52]  M. S. Bartlett,et al.  Some Evolutionary Stochastic Processes , 1949 .

[53]  W. O. Kermack,et al.  A contribution to the mathematical theory of epidemics , 1927 .

[54]  Howard M. Weiss The SIR model and the Foundations of Public Health , 2013 .

[55]  HighWire Press Proceedings of the Royal Society of London. Series A, Containing papers of a mathematical and physical character , 1934 .

[56]  G. Ke A contribution to the mathematical theory of epidemics , 2022 .