Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling

Abstract Machine learning algorithms such as Random Forest (RF) are being increasingly applied on traditionally geographical topics such as population estimation. Even though RF is a well performing and generalizable algorithm, the vast majority of its implementations is still ‘aspatial’ and may not address spatial heterogenous processes. At the same time, remote sensing (RS) data which are commonly used to model population can be highly spatially heterogeneous. From this scope, we present a novel geographical implementation of RF, named Geographical Random Forest (GRF) as both a predictive and exploratory tool to model population as a function of RS covariates. GRF is a disaggregation of RF into geographical space in the form of local sub-models. From the first empirical results, we conclude that GRF can be more predictive when an appropriate spatial scale is selected to model the data, with reduced residual autocorrelation and lower Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) values. Finally, and of equal importance, GRF can be used as an effective exploratory tool to visualize the relationship between dependent and independent variables, highlighting interesting local variations and allowing for a better understanding of the processes that may be causing the observed spatial heterogeneity.

[1]  Sabine Vanhuysse,et al.  Less is more: optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application , 2018 .

[2]  Sabine Vanhuysse,et al.  Mapping Urban Land Use at Street Block Level Using OpenStreetMap, Remote Sensing Data, and Spatial Metrics , 2018, ISPRS Int. J. Geo Inf..

[3]  Pavel Propastin,et al.  Spatial non-stationarity and scale-dependency of prediction accuracy in the remote estimation of LAI over a tropical rainforest in Sulawesi, Indonesia. , 2009 .

[4]  A. Stewart Fotheringham,et al.  Geographical and Temporal Weighted Regression (GTWR) , 2015 .

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Chris Brunsdon,et al.  Geographically Weighted Regression: The Analysis of Spatially Varying Relationships , 2002 .

[7]  Stamatis Kalogirou,et al.  Destination Choice of Athenians: An Application of Geographically Weighted Versions of Standard and Zero Inflated Poisson Spatial Interaction Models , 2016 .

[8]  P. Moran The Interpretation of Statistical Maps , 1948 .

[9]  Un-Habitat The State of African Cities 2014: Re-Imagining Sustainable Urban Transitions , 2015 .

[10]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[11]  S. Fotheringham,et al.  Geographically Weighted Regression , 1998 .

[12]  Luc Anselin,et al.  Lagrange Multiplier Test Diagnostics for Spatial Dependence and Spatial Heterogeneity , 2010 .

[13]  M. Herold,et al.  Population Density and Image Texture: A Comparison Study , 2006 .

[14]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[15]  Jean-Michel Poggi,et al.  VSURF: An R Package for Variable Selection Using Random Forests , 2015, R J..

[16]  Shuangcheng Li,et al.  Spatial pattern of non-stationarity and scale-dependent relationships between NDVI and climatic factors—A case study in Qinghai-Tibet Plateau, China , 2012 .

[17]  Catherine Linard,et al.  Improving Urban Population Distribution Models with Very-High Resolution Satellite Information , 2019, Data.

[18]  Catherine Linard,et al.  Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data , 2015, PloS one.

[19]  Thomas Hatzichristos,et al.  A Spatial Modelling Framework for Income Estimation , 2007 .

[20]  Marvin N. Wright,et al.  Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables , 2018, PeerJ.

[21]  Abdulhakim M. Abdi,et al.  Examining the NDVI-rainfall relationship in the semi-arid Sahel using geographically weighted regression , 2017 .

[22]  Catherine Linard,et al.  Using Local Climate Zones in Sub-Saharan Africa to tackle urban health issues , 2019, Urban Climate.

[23]  Frank Canters,et al.  Incorporating spatial non-stationarity to improve dasymetric mapping of population , 2015 .

[24]  G. Foody Geographical weighting as a further refinement to regression modelling: An example focused on the NDVI–rainfall relationship , 2003 .

[25]  Catherine Linard,et al.  Modelling spatial patterns of urban growth in Africa. , 2013, Applied geography.

[26]  S. Suárez‐Seoane,et al.  Non‐stationarity and local approaches to modelling the distributions of wildlife , 2007 .

[27]  A. Stewart Fotheringham,et al.  Multiscale Geographically Weighted Regression (MGWR) , 2017 .

[28]  Alan T. Murray,et al.  Population Estimation Using Landsat Enhanced Thematic Mapper Imagery , 2007 .

[29]  Xiaomin Qiu,et al.  Population Estimation Methods in GIS and Remote Sensing: A Review , 2005 .

[30]  C. Lo Population Estimation Using Geographically Weighted Regression , 2008 .

[31]  Thomas Blaschke,et al.  Evaluation of Feature Selection Methods for Object-Based Land Cover Mapping of Unmanned Aerial Vehicle Imagery Using Random Forest and Support Vector Machine Classifiers , 2017, ISPRS Int. J. Geo Inf..

[32]  Taïs Grippa,et al.  Dakar very-high resolution land cover map , 2018 .

[33]  J. A. Quintanilha,et al.  DMSP/OLS night‐time light imagery for urban population estimates in the Brazilian Amazon , 2006 .