Syndromic surveillance models using Web data: The case of scarlet fever in the UK

Recent research has shown the potential of Web queries as a source for syndromic surveillance, and existing studies show that these queries can be used as a basis for estimation and prediction of the development of a syndromic disease, such as influenza, using log linear (logit) statistical models. Two alternative models are applied to the relationship between cases and Web queries in this paper. We examine the applicability of using statistical methods to relate search engine queries with scarlet fever cases in the UK, taking advantage of tools to acquire the appropriate data from Google, and using an alternative statistical method based on gamma distributions. The results show that using logit models, the Pearson correlation factor between Web queries and the data obtained from the official agencies must be over 0.90, otherwise the prediction of the peak and the spread of the distributions gives significant deviations. In this paper, we describe the gamma distribution model and show that we can obtain better results in all cases using gamma transformations, and especially in those with a smaller correlation factor.

[1]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[2]  Xi-chuan Zhou,et al.  Notifiable infectious disease surveillance with data collected by search engine , 2010, Journal of Zhejiang University SCIENCE C.

[3]  R. Mägi,et al.  Genetic Structure of Europeans: A View from the North–East , 2009, PloS one.

[4]  V. Jormanainen,et al.  Physicians' database searches as a tool for early detection of epidemics. , 2001, Emerging infectious diseases.

[5]  Jing Wang,et al.  Epidemiological investigation of scarlet fever in Hefei City, China, from 2004 to 2008 , 2010, Tropical doctor.

[6]  Eva Andersson,et al.  Predictions by early indicators of the time and height of the peaks of yearly influenza outbreaks in Sweden , 2008, Scandinavian journal of public health.

[7]  A. Hulth,et al.  Web Queries as a Source for Syndromic Surveillance , 2009, PloS one.

[8]  N. Jewell Statistics for Epidemiology , 2003 .

[9]  S. R. Duncan,et al.  Modelling the dynamics of scarlet fever epidemics in the 19th century , 2004, European Journal of Epidemiology.

[10]  Wendy W. Chapman,et al.  Analysis of Web Access Logs for Surveillance of Influenza , 2004, MedInfo.

[11]  John E. Bennett,et al.  Principles and practice of infectious diseases. Vols 1 and 2. , 1979 .

[12]  Hsinchun Chen,et al.  Syndromic surveillance systems , 2008, Annu. Rev. Inf. Sci. Technol..

[13]  Hsinchun Chen,et al.  Syndromic surveillance systems , 2008, Annu. Rev. Inf. Sci. Technol..

[14]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[15]  W Reichardt,et al.  Streptococcus pyogenes. , 2001, Contributions to microbiology.

[16]  David M. Pennock,et al.  Using Internet Searches for Influenza Surveillance ( Internet Search Term Surveillance for Flu ) , 2008 .