Detecting Spatial Clusters of Disease Infection Risk Using Sparsely Sampled Social Media Mobility Patterns

Standard spatial cluster detection methods used in public health surveillance assign each disease case to a single location (typically, the patient's home address), aggregate locations to small areas, and monitor the number of cases in each area over time. However, such methods cannot detect clusters of disease resulting from visits to non-residential locations, such as a park or a university campus. Thus we develop two new spatial scan methods, the unconditional and conditional spatial logistic models, to search for spatial clusters of increased infection risk. We use mobility data from two sets of individuals, disease cases and healthy individuals, where each individual is represented by a sparse sample of geographical locations (e.g., from geo-tagged social media data). The methods account for the multiple, varying number of spatial locations observed per individual, either by non-parametric estimation of the odds of being a case, or by matching case and control individuals with similar numbers of observed locations. Applying our methods to synthetic and real-world scenarios, we demonstrate robust performance on detecting spatial clusters of infection risk from mobility data, outperforming competing baselines.

[1]  Steven T Stoddard,et al.  The relationship between entomological indicators of Aedes aegypti abundance and dengue virus infection , 2017, PLoS neglected tropical diseases.

[2]  T. Tango,et al.  International Journal of Health Geographics a Flexibly Shaped Spatial Scan Statistic for Detecting Clusters , 2005 .

[3]  Renato Assunção,et al.  A Simulated Annealing Strategy for the Detection of Arbitrarily Shaped Spatial Clusters , 2022 .

[4]  Gisele L. Pappa,et al.  An Evolutionary Methodology for Handling Data Scarcity and Noise in Monitoring Real Events from Social Media Data , 2014, IBERAMIA.

[5]  Wagner Meira,et al.  Infection Hot Spot Mining from Social Media Trajectories , 2016, ECML/PKDD.

[6]  R. Assunção,et al.  Fast detection of arbitrarily shaped disease clusters , 2006, Statistics in medicine.

[7]  Andrew W. Moore,et al.  Rapid detection of significant spatial clusters , 2004, KDD.

[8]  Pierre Goovaerts,et al.  Global, local and focused geographic clustering for case-control data with residential histories , 2005, Environmental health : a global access science source.

[9]  Liang Zhang,et al.  Scalable spatial scan statistics through sampling , 2016, SIGSPATIAL/GIS.

[10]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[11]  M. Kulldorff,et al.  An elliptic spatial scan statistic , 2006, Statistics in medicine.

[12]  Daniel B. Neill,et al.  Where did I get dengue? Detecting spatial clusters of infection risk with social network data. , 2019, Spatial and spatio-temporal epidemiology.

[13]  Mohammed J. Zaki,et al.  Lazy Associative Classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[14]  John S. Brownstein,et al.  The global distribution and burden of dengue , 2013, Nature.

[15]  A. Wilder-Smith,et al.  Epidemiology of dengue: past, present and future prospects , 2013, Clinical epidemiology.

[16]  Pemetaan Jumlah Balita,et al.  Spatial Scan Statistic , 2014, Encyclopedia of Social Network Analysis and Mining.

[17]  Daniel B. Neill,et al.  Fast generalized subset scan for anomalous pattern detection , 2013, J. Mach. Learn. Res..

[18]  Marcelo Azevedo Costa,et al.  Constrained spanning tree algorithms for irregularly-shaped spatial clustering , 2012, Comput. Stat. Data Anal..

[19]  Andrew Gordon Wilson,et al.  Gaussian Process Subset Scanning for Anomalous Pattern Detection in Non-iid Data , 2018, AISTATS.

[20]  Sanjay Ranka,et al.  A LRT framework for fast spatial anomaly detection , 2009, KDD.

[21]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[22]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[23]  Pierre Goovaerts,et al.  Case-control geographic clustering for residential histories accounting for risk factors and covariates , 2015 .

[24]  E. Lesaffre,et al.  Disease mapping and risk assessment for public health. , 1999 .

[25]  Sriram Somanchi,et al.  Discovering anomalous patterns in large digital pathology images , 2018, Statistics in medicine.

[26]  Daniel B. Neill,et al.  Fast subset scan for spatial pattern detection , 2012 .

[27]  T. Scott,et al.  House-to-house human movement drives dengue virus transmission , 2012, Proceedings of the National Academy of Sciences.

[28]  Lei Shi,et al.  Anomalous window discovery through scan statistics for linear intersecting paths (SSLIP) , 2009, KDD.