Prediction of Infectious Disease Spread Using Twitter: A Case of Influenza

Nowadays, detecting the disaster phenomena and predicting the final stage become very important in the risk analysis view-point. The statistical methods provide accurate estimates of parameters when the data are completely given. However, when the data are incomplete, the accuracy of the estimates becomes poor. Therefore, statistical methods are weak in predicting the future trends. The SIR methods, for infectious disease spread prediction, using the differential equations can sometimes provide accurate estimates for the final stage. These methods, however, require some inspection time, which means the delay of analysis at least one week or so when we want to predict the future trends. To detect the disasters and to predict the future trends much earlier, we can use the social network system (SNS). In this paper, we have proposed a method to predict the future trend of influenza by using Twitter. We have analyzed the possibility of building a regression model by combining Twitter messages and CDC's Influenza-Like Illness (ILI) data, and we have found that the multiple linear regression model with ridge regularization outperforms the single linear regression model and other un-regularized least squared methods. The model of multiple linear regression with ridge can notably improve the prediction accuracy.

[1]  W. O. Kermack,et al.  Contributions to the mathematical theory of epidemics—III. Further studies of the problem of endemicity , 1991 .

[2]  Walter L. Deemer,et al.  Estimation of Parameters of Truncated or Censored Exponential Distributions , 1955 .

[3]  C S Minot THE PROBLEM OF CONSCIOUSNESS IN ITS BIOLOGICAL ASPECTS. , 1902, Science.

[4]  W. O. Kermack,et al.  A contribution to the mathematical theory of epidemics , 1927 .

[5]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[6]  Hideo Hirose,et al.  Estimation for the size of fragile population in the trunsored and truncated models with application to the confidence interval for the case fatality ratio of SARS , 2009 .

[7]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[8]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[9]  Ram C. Dahiya,et al.  Estimating the parameters of a truncated weibull distribution , 1989 .

[10]  Hideo Hirose,et al.  The Consistency of the Pandemic Simulations between the SEIR Model and the MAS Model , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[11]  P. Kaye Infectious diseases of humans: Dynamics and control , 1993 .

[12]  Hideo Hirose,et al.  Pandemic Simulations by MADE: A Combination of Multi-agent and Differential Euations , 2009, PDPTA.

[13]  A. Langworthy,et al.  An influenza simulation model for immunization studies. , 1976, American journal of epidemiology.

[14]  W. O. Kermack,et al.  Contributions to the mathematical theory of epidemics—I , 1991, Bulletin of mathematical biology.

[15]  Hideo Hirose,et al.  Estimation of the number of failures in the Weibull model using the ordinary differential equation , 2012, Eur. J. Oper. Res..

[16]  Chung-Yuan Huang,et al.  A Novel Small-World Model: Using Social Mirror Identities for Epidemic Simulations , 2005, Simul..

[17]  Toyosaka Yuki,et al.  The consistency between the two kinds of pandemic simulations of the SEIR model and the MAS model , 2008 .

[18]  Hideo Hirose,et al.  Parameter Estimation for the Truncated Weibull Model Using the Ordinary Differential Equation , 2011, 2011 First ACIS/JNU International Conference on Computers, Networks, Systems and Industrial Engineering.

[19]  Aron Culotta,et al.  Detecting influenza outbreaks by analyzing Twitter messages , 2010, ArXiv.

[20]  John Matson Laying odds on the apocalypse. , 2010, Scientific American.

[21]  Hideo Hirose,et al.  The mixed trunsored model with applications to SARS , 2006, Mathematics and Computers in Simulation.

[22]  Hideo Hirose,et al.  PARAMETER ESTIMATION BASED ON GROUPED OR CONTINUOUS DATA FOR TRUNCATED EXPONENTIAL DISTRIBUTIONS , 2002 .

[23]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.