Inferring the contiguity matrix for spatial autoregressive analysis with applications to house price prediction

Inference methods in traditional statistics, machine learning and data mining assume that data is generated from an independent and identically distributed (iid) process. Spatial data exhibits behavior for which the iid assumption must be relaxed. For example, the standard approach in spatial regression is to assume the existence of a contiguity matrix which captures the spatial autoregressive properties of the data. However all spatial methods, till now, have assumed that the contiguity matrix is given apriori or can be estimated by using a spatial similarity function. In this paper we propose a convex optimization formulation to solve the spatial autoregressive regression (SAR) model in which both the contiguity matrix and the non-spatial regression parameters are unknown and inferred from the data. We solve the problem using the alternating direction method of multipliers (ADMM) which provides a solution which is both robust and efficient. While our approach is general we use data from housing markets of Boston and Sydney to both guide the analysis and validate our results. A novel side effect of our approach is the automatic discovery of spatial clusters which translate to submarkets in the housing data sets.