Performance Analysis of Machine Learning Algorithms for Regression of Spatial Variables . A Case Study in the Real Estate Industry

Machine learning is a computational technology widely used in regression and classification tasks. One of the drawbacks of its use in the analysis of spatial variables is that machine learning algorithms are in general, not designed to deal with spatially autocorrelated data. This often causes the residuals to exhibit clustering, in clear violation of the condition of independent and identically distributed random variables. In this work we analyze the performance of some well-established machine learning algorithms and one spatial algorithm in the prediction of the average rent price of certain real estate units in the Miami-Fort Lauderdale-West Palm Beach metropolitan area in Florida, USA. We defined “performance” as the goodness of fit achieved by an algorithm in conjunction with the degree of spatial association of the residuals. We identified significant differences between the machine learning algorithms in their sensitivity to spatial autocorrelation and the achieved goodness of fit. We also exposed the superiority of machine learning algorithms over generalized least squares in both goodness of fit and residual spatial autocorrelation. Finally we show preliminary evidence that blending ensemble learning can be used to optimize a regression problem. Our findings can be useful in designing a strategy for regression of spatial variables.

[1]  Jingyi Mu,et al.  Housing Value Forecasting Based on Machine Learning Methods , 2014 .

[2]  Patrick Hostert,et al.  A Comparison of Advanced Regression Algorithms for Quantifying Urban Land Cover , 2014, Remote. Sens..

[3]  P. Leitão,et al.  Assessment of land use factors associated with dengue cases in Malaysia using Boosted Regression Trees. , 2014, Spatial and spatio-temporal epidemiology.

[4]  Stig Vinther Møller,et al.  Forecasting House Prices in the 50 States Using Dynamic Model Averaging and Dynamic Model Selection , 2014 .

[5]  Matthew J. Cracknell,et al.  Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information , 2014, Comput. Geosci..

[6]  John W. Sheppard,et al.  Using machine learning to predict catastrophes in dynamical systems , 2012, J. Comput. Appl. Math..

[7]  Jin Li,et al.  Application of machine learning methods to spatial interpolation of environmental variables , 2011, Environ. Model. Softw..

[8]  Tobia Lakes,et al.  Cropland change in southern Romania: a comparison of logistic regressions and artificial neural networks , 2009, Landscape Ecology.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[11]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[12]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[13]  M. Browne Generalized Least Squares Estimators in the Analysis of Covariance Structures. , 1973 .