SPODT: An R Package to Perform Spatial Partitioning

Spatial cluster detection is a classical question in epidemiology: Are cases located near other cases? In order to classify a study area into zones of different risks and determine their boundaries, we have developed a spatial partitioning method based on oblique decision trees, which is called spatial oblique decision tree (SpODT). This non-parametric method is based on the classification and regression tree (CART) approach introduced by Leo Breiman. Applied to epidemiological spatial data, the algorithm recursively searches among the coordinates for a threshold or a boundary between zones, so that the risks estimated in these zones are as different as possible. While the CART algorithm leads to rectangular zones, providing perpendicular splits of longitudes and latitudes, the SpODT algorithm provides oblique splitting of the study area, which is more appropriate and accurate for spatial epidemiology. Oblique decision trees can be considered as non-parametric regression models. Beyond the basic function, we have developed a set of functions that enable extended analyses of spatial data, providing: inference, graphical representations, spatio-temporal analysis, adjustments on covariates, spatial weighted partition, and the gathering of similar adjacent final classes. In this paper, we propose a new R package, SPODT, which provides an extensible set of functions for partitioning spatial and spatio-temporal data. The implementation and extensions of the algorithm are described. Function usage examples are proposed, looking for clustering malaria episodes in Bandiagara, Mali, and samples showing three different cluster shapes.

[1]  Jean-François Viel,et al.  L’analyse de cluster en épidémiologie géographique : utilisation de plusieurs méthodes statistiques et comparaison de leurs résultats , 2004 .

[2]  Servane Gey Bornes de risque, détection de ruptures, boosting : trois thèmes statistiques autour de CART en régression , 2002 .

[3]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[4]  J. Hinde,et al.  Models for diagnosing chest pain: is CART helpful? , 1997, Statistics in medicine.

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  Toshiro Tango,et al.  Score tests for detecting excess risks around putative sources , 2002, Statistics in medicine.

[7]  Corina da Costa Freitas,et al.  Efficient regionalization techniques for socio‐economic geographical units using minimum spanning trees , 2006, Int. J. Geogr. Inf. Sci..

[8]  Anne Lohrli Chapman and Hall , 1985 .

[9]  J. Gaudart,et al.  Oblique decision trees for spatial pattern detection: optimal algorithm and application to malaria risk. , 2005 .

[10]  Michael Tiefelsdorf,et al.  The Saddlepoint Approximation of Moran's I's and Local Moran's I i's Reference Distributions and Their Numerical Evaluation , 2002 .

[11]  Robert R. Sokal,et al.  Categorical Wombling: Detecting Regions of Significant Change in Spatially Located Categorical Variables , 2010 .

[12]  Chandrika Kamath,et al.  Inducing oblique decision trees with evolutionary algorithms , 2003, IEEE Trans. Evol. Comput..

[13]  Marlize Coleman,et al.  Using the SaTScan method to detect local malaria clusters for guiding malaria control programmes , 2009, Malaria Journal.

[14]  Jean Gaudart,et al.  Space-time clustering of childhood malaria at the household level: a dynamic cohort in a Mali village , 2006, BMC public health.

[15]  M Colonna,et al.  [Cluster analysis in geographical epidemiology: the use of several statistical methods and comparison of their results]. , 2004, Revue d'epidemiologie et de sante publique.

[16]  Pemetaan Jumlah Balita,et al.  Spatial Scan Statistic , 2014, Encyclopedia of Social Network Analysis and Mining.

[17]  M Colonna,et al.  [Detecting spatial autocorrelation of cancer risk when population density is heterogeneous]. , 1993, Revue d'epidemiologie et de sante publique.

[18]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[19]  B. Greenwood,et al.  The microepidemiology of malaria and its importance to malaria control. , 1989, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[20]  K. Mendis,et al.  Spatial targeting of interventions against malaria. , 2000, Bulletin of the World Health Organization.

[21]  G. Shaddick,et al.  Spatial statistical methods in environmental epidemiology: a critique , 1995, Statistical methods in medical research.

[22]  L. Waller,et al.  Applied Spatial Statistics for Public Health Data , 2004 .

[23]  Chandy C John,et al.  Environmental, socio‐demographic and behavioural determinants of malaria risk in the western Kenyan highlands: a case–control study , 2009, Tropical medicine & international health : TM & IH.

[24]  Edzer J. Pebesma,et al.  Applied Spatial Data Analysis with R - Second Edition , 2008, Use R!.

[25]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[26]  B. Greenwood,et al.  Socio-economic risk factors for malaria in a peri-urban area of The Gambia. , 1995, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[27]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[28]  Guangqing Chi,et al.  Applied Spatial Data Analysis with R , 2015 .

[29]  Helen Swede,et al.  Effects of study area size on geographic characterizations of health events: Prostate cancer incidence in Southern New England, USA, 1994–1998 , 2006, International journal of health geographics.