Complexity-Based Spatial Hierarchical Clustering for Malaria Prediction

Targeted intervention and resource allocation are essential in effective control of infectious diseases, particularly those like malaria that tend to occur in remote areas. Disease prediction models can help support targeted intervention, particularly if they have fine spatial resolution. But, choosing an appropriate resolution is a difficult problem since choice of spatial scale can have a significant impact on accuracy of predictive models. In this paper, we introduce a new approach to spatial clustering for disease prediction we call complexity-based spatial hierarchical clustering. The technique seeks to find spatially compact clusters that have time series that can be well characterized by models of low complexity. We evaluate our approach with 2 years of malaria case data from Tak Province in northern Thailand. We show that the technique’s use of reduction in Akaike information criterion (AIC) and Bayesian information criterion (BIC) as clustering criteria leads to rapid improvement in predictability and significantly better predictability than clustering based only on minimizing spatial intra-cluster distance for the entire range of cluster sizes over a variety of predictive models and prediction horizons.

[1]  A J Graham,et al.  Spatial analysis for epidemiology. , 2004, Acta tropica.

[2]  L. F. Chaves,et al.  Shifting patterns: malaria dynamics and rainfall variability in an African highland , 2008, Proceedings of the Royal Society B: Biological Sciences.

[3]  Bin Yu,et al.  Model Selection and the Principle of Minimum Description Length , 2001 .

[4]  I. Kleinschmidt,et al.  The impact of hotspot-targeted interventions on malaria transmission: study protocol for a cluster-randomized controlled trial , 2013, Trials.

[5]  Jaranit Kaewkungwal,et al.  Artemisinin resistance containment project in Thailand. (I): Implementation of electronic-based malaria information system for early case detection and individual case management in provinces along the Thai-Cambodian border , 2012, Malaria Journal.

[6]  Witold Pedrycz,et al.  Granular Computing: Analysis and Design of Intelligent Systems , 2013 .

[7]  Spyros Makridakis,et al.  Accuracy measures: theoretical and practical concerns☆ , 1993 .

[8]  P. Diggle Applied Spatial Statistics for Public Health Data , 2005 .

[9]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[10]  M. Kulldorff A spatial scan statistic , 1997 .

[11]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[12]  Joel Schwartz,et al.  Weather-based prediction of Plasmodium falciparum malaria in epidemic-prone regions of Ethiopia I. Patterns of lagged weather effects reflect biological mechanisms , 2004, Malaria Journal.

[13]  Fernando Gomide,et al.  Evolving granular analytics for interval time series forecasting , 2016, Granular Computing.

[14]  S. Pincus Approximate entropy (ApEn) as a complexity measure. , 1995, Chaos.

[15]  F F Nobre,et al.  Dynamic linear model and SARIMA: a comparison of their forecasting performance in epidemiology , 2001, Statistics in medicine.

[16]  A S Fotheringham,et al.  The Modifiable Areal Unit Problem in Multivariate Statistical Analysis , 1991 .

[17]  A. Gelman,et al.  All maps of parameter estimates are misleading. , 1999, Statistics in medicine.

[18]  Paolo Gamba,et al.  Integration of Administrative, Clinical, and Environmental Data to Support the Management of Type 2 Diabetes Mellitus , 2015, Journal of diabetes science and technology.

[19]  Peter Haddawy,et al.  AIC-Driven Spatial Hierarchical Clustering: Case Study for Malaria Prediction in Northern Thailand , 2017, MIWAI.

[20]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[21]  Jaymie R Meliker,et al.  Spatio-temporal epidemiology: principles and opportunities. , 2011, Spatial and spatio-temporal epidemiology.

[22]  Pablo Montero,et al.  TSclust: An R Package for Time Series Clustering , 2014 .

[23]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[24]  K. Rasheed,et al.  HURST EXPONENT AND FINANCIAL MARKET PREDICTABILITY , 2005 .

[25]  Abera Kumie,et al.  Spatiotemporal clusters of malaria cases at village level, northwest Ethiopia , 2014, Malaria Journal.

[26]  J. Brownstein,et al.  A scoping review of malaria forecasting: past work and future directions , 2012, BMJ Open.

[27]  Teun Bousema,et al.  Hot spot or not: a comparison of spatial statistical methods to predict prospective malaria infections , 2014, Malaria Journal.

[28]  David W. S. Wong The Modifiable Areal Unit Problem (MAUP) , 2004 .

[29]  L. Waller,et al.  Applied Spatial Statistics for Public Health Data , 2004 .

[30]  Jaranit Kaewkungwal,et al.  Spatiotemporal Bayesian networks for malaria prediction , 2017, Artif. Intell. Medicine.

[31]  A. Noor,et al.  Effect of transmission intensity on hotspots and micro-epidemiology of malaria in sub-Saharan Africa , 2017, BMC Medicine.

[32]  Jonas Franke,et al.  Geostatistical modelling of the malaria risk in Mozambique: effect of the spatial resolution when using remotely-sensed imagery. , 2015, Geospatial health.

[33]  Peter Haddawy,et al.  A Comparative Analysis of Bayesian Network and ARIMA Approaches to Malaria Outbreak Prediction , 2017, IC2IT.

[34]  Witold Pedrycz,et al.  Granular Computing - The Emerging Paradigm , 2007 .