A Machine Learning Approach to Delineating Neighborhoods from Geocoded Appraisal Data

Identification of neighborhoods is an important, financially-driven topic in real estate. It is known that the real estate industry uses ZIP (postal) codes and Census tracts as a source of land demarcation to categorize properties with respect to their price. These demarcated boundaries are static and are inflexible to the shift in the real estate market and fail to represent its dynamics, such as in the case of an up-and-coming residential project. Delineated neighborhoods are also used in socioeconomic and demographic analyses where statistics are computed at a neighborhood level. Current practices of delineating neighborhoods have mostly ignored the information that can be extracted from property appraisals. This paper demonstrates the potential of using only the distance between subjects and their comparable properties, identified in an appraisal, to delineate neighborhoods that are composed of properties with similar prices and features. Using spatial filters, we first identify regions with the most appraisal activity, and through the application of a spatial clustering algorithm, generate neighborhoods composed of properties sharing similar characteristics. Through an application of bootstrapped linear regression, we find that delineating neighborhoods using geolocation of subjects and comparable properties explains more variation in a property’s features, such as valuation, square footage, and price per square foot, than ZIP codes or Census tracts. We also discuss the ability of the neighborhoods to grow and shrink over the years, due to shifts in each housing submarket.

[1]  D. Acevedo-Garcia,et al.  Zip code-level risk factors for tuberculosis: neighborhood environment and residential segregation in New Jersey, 1985-1992. , 2001, American journal of public health.

[2]  Ate Poorthuis,et al.  How to Draw a Neighborhood? The Potential of Big Data, Regionalization, and Community Detection for Understanding the Heterogeneous Nature of Urban Neighborhoods , 2018 .

[3]  Elizabeth Oltmans Ananat,et al.  The Wrong Side(S) of the Tracks Estimating the Causal Effects of Racial Segregation on City Outcomes , 2007 .

[4]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[5]  John B. Lansing,et al.  Response Errors in Estimating the Value of Homes , 1954 .

[6]  G. Northcraft,et al.  Experts, amateurs, and real estate: An anchoring-and-adjustment perspective on property pricing decisions , 1987 .

[7]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[8]  G. Kesteven,et al.  The Coefficient of Variation , 1946, Nature.

[9]  Afamia Elnakat,et al.  A zip code study of socioeconomic, demographic, and household gendered influence on the residential energy sector , 2016 .

[10]  V. Assimakopoulos,et al.  Real estate appraisal: a review of valuation methods , 2003 .

[11]  Jeremy N. V. Miles,et al.  R Squared, Adjusted R Squared† , 2005 .

[12]  Tom Kauko,et al.  A Comparative Perspective on Urban Spatial Housing Market Structure: Some More Evidence of Local Sub-markets Based on a Neural Network Classification of Amsterdam , 2004 .

[13]  Tony H. Grubesic,et al.  Zip codes and spatial analysis: Problems and prospects , 2008 .

[14]  Evgeny A. Antipov,et al.  Mass Appraisal of Residential Apartments: An Application of Random Forest for Valuation and a CART-Based Approach for Model Diagnostics , 2010, Expert Syst. Appl..

[15]  Jesus Hernandez,et al.  Redlining Revisited: Mortgage Lending Patterns in Sacramento 1930-2004 , 2009 .

[16]  Julian Diaz,et al.  An Investigation into the Impact of Previous Expert Value Estimates on Appraisal Judgment , 1997 .

[17]  Hyunsu Ju,et al.  The socio-spatial neighborhood estimation method: an approach to operationalizing the neighborhood concept. , 2011, Health & place.

[18]  Craig H. Wisen,et al.  On the Time‐Series Properties of Real Estate Investment Trust Betas , 2005 .

[19]  Zheng Liu,et al.  Identifying Urban Neighborhood Names through User-Contributed Online Property Listings , 2018, ISPRS Int. J. Geo Inf..

[20]  M. R. Stoline The Status of Multiple Comparisons: Simultaneous Estimation of all Pairwise Comparisons in One-Way ANOVA Designs , 1981 .

[21]  John F. Kain,et al.  Note on Owner's Estimate of Housing Value , 1972 .

[22]  Adam Drewnowski,et al.  Disparities in obesity rates: analysis by ZIP code area. , 2007, Social science & medicine.

[23]  Eugenio Cesario,et al.  Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Monika Sester,et al.  PARAMETER-FREE CLUSTER DETECTION IN SPATIAL DATABASES AND ITS APPLICATION TO TYPIFICATION , 2000 .

[25]  Jean-Claude Thill,et al.  Social area analysis, data mining, and GIS , 2008, Comput. Environ. Urban Syst..

[26]  S. Margret Anouncia,et al.  Unsupervised Segmentation of Remote Sensing Images using FD Based Texture Analysis Model and ISODATA , 2017, Int. J. Ambient Comput. Intell..

[27]  David,et al.  Family Composition and Consumption , 1963 .

[28]  Steven M. Manson,et al.  Intraurban Migration, Neighborhoods, and City Structure , 2012 .

[29]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[30]  Jonathan D. Jordan,et al.  A Comparison of Fuzzy vs. Augmented-ISODATA Classification Algorithms for Cloud-Shadow Discrimination from Landsat Images , 2002 .

[31]  Michael R Kramer,et al.  International Journal of Health Geographics Open Access Methodology Methodology Do Measures Matter? Comparing Surface-density-derived and Census-tract-derived Measures of Racial Residential Segregation , 2022 .

[32]  D. Grigg,et al.  THE LOGIC OF REGIONAL SYSTEMS1 , 1965 .

[33]  N. Chappell,et al.  Defining Community Boundaries in Health Promotion Research , 2006, American journal of health promotion : AJHP.

[34]  J. Fox Bootstrapping Regression Models , 2002 .

[35]  Martin Hoesli,et al.  Defining Housing Submarkets , 1999 .

[36]  A. Stewart Fotheringham,et al.  Principal Component Analysis on Spatial Data: An Overview , 2013 .

[37]  Faten Sabry,et al.  Home Equity, Home Value, and Determinants of Mortgage Defaults During the Credit Crisis , 2016 .

[38]  Maribel Yasmina Santos,et al.  Concave hull: A k-nearest neighbours approach for the computation of the region occupied by a set of points , 2007, GRAPP.

[39]  C. Coulton,et al.  Mapping Residents' Perceptions of Neighborhood Boundaries: A Methodological Note , 2001, American journal of community psychology.

[40]  Marco Aurélio Stumpf González,et al.  Mass Appraisal With Genetic Fuzzy Rule-Based Systems , 2003 .

[41]  Sheridan Titman,et al.  Do Real Estate Prices and Stock Prices Move Together? An International Analysis , 1998 .

[42]  Peter Nijkamp,et al.  Multidimensional urban sprawl in Europe: A self-organizing map approach , 2011, Comput. Environ. Urban Syst..

[43]  Murali Haran,et al.  Dimension reduction and alleviation of confounding for spatial generalized linear mixed models , 2010, 1011.6649.

[44]  D. Grigg THE LOGIC OF REGIONAL SYSTEMS , 2016 .

[45]  Robert W. Faris,et al.  Measuring 'neighborhood': Constructing network neighborhoods , 2012, Soc. Networks.

[46]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[47]  Xuelong Li,et al.  DSets-DBSCAN: A Parameter-Free Clustering Algorithm , 2016, IEEE Transactions on Image Processing.

[48]  S. Strogatz,et al.  Redrawing the Map of Great Britain from a Network of Human Interactions , 2010, PloS one.

[49]  David M. Mount,et al.  A Fast Implementation of the Isodata Clustering Algorithm , 2007, Int. J. Comput. Geom. Appl..