A Fully Automated Adjustment of Ensemble Methods in Machine Learning for Modeling Complex Real Estate Systems

The close relationship between collateral value and bank stability has led to a considerable need to a rapid and economical appraisal of real estate. The greater availability of information related to housing stock has prompted to the use of so-called big data and machine learning in the estimation of property prices. Although this methodology has already been applied to the real estate market to identify which variables influence dwelling prices, its use for estimating the price of properties is not so frequent. The application of this methodology has become more sophisticated over time, from applying simple methods to using the so-called ensemble methods and, while the estimation capacity has improved, it has only been applied to specific geographical areas. The main contribution of this article lies in developing an application for the entire Spanish market that fully automatically provides the best model for each municipality. Real estate property prices in 433 municipalities are estimated from a sample of 790,631 dwellings, using different ensemble methods based on decision trees such as bagging, boosting, and random forest. The results for estimating the price of dwellings show a good performance of the techniques developed, in terms of the error measures, with the best results being achieved using the techniques of bagging and random forest.

[1]  Eduard Hromada Mapping of Real Estate Prices Using Data Mining Techniques , 2015 .

[2]  Nils Kok,et al.  Big Data in Real Estate? From Manual Appraisal to Automated Valuation , 2017 .

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Hasan Şahin,et al.  Housing price determinants in Istanbul, Turkey: An application of the classification and regression tree model , 2009 .

[5]  Evgeny A. Antipov,et al.  Mass Appraisal of Residential Apartments: An Application of Random Forest for Valuation and a CART-Based Approach for Model Diagnostics , 2010, Expert Syst. Appl..

[6]  Dean R. De Cock,et al.  Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project , 2011 .

[7]  Juan Carlos Correa-Morales,et al.  A machine learning approach to big data regression analysis of real estate prices for inferential and predictive purposes , 2019, Journal of Property Research.

[8]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[9]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  John E. Wagner,et al.  Variable selection for hedonic model using machine learning approaches: A case study in Onondaga County, NY , 2012 .

[12]  Shabana,et al.  Housing valuation of different towns using the hedonic model: A case of Faisalabad city, Pakistan , 2015 .

[13]  Rainer Schulz,et al.  Automated valuation modelling: a specification exercise , 2013 .

[14]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[15]  Jae Kwon Bae,et al.  Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data , 2015, Expert Syst. Appl..

[16]  Stephen D. Clark,et al.  A mass-market appraisal of the English housing rental market using a diverse range of modelling techniques , 2018, Journal of Big Data.

[17]  Maciej Beręsewicz,et al.  On Representativeness of Internet Data Sources for Real Estate Market in Poland , 2015 .

[18]  Marius Thériault,et al.  Modelling accessibility to urban services using fuzzy logic , 2005 .

[19]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[20]  Diofantos G. Hadjimitsis,et al.  Accuracy measurement of Random Forests and Linear Regression for mass appraisal models that estimate the prices of residential apartments in Nicosia, Cyprus , 2018, Advances in Geosciences.

[21]  Carlos Martins-Filho,et al.  Estimation of hedonic price functions via additive nonparametric regression , 2005 .

[22]  A. Cavallo,et al.  Are Online and Offline Prices Similar? Evidence from Large Multi-Channel Retailers , 2016 .

[23]  Robert P. W. Duin,et al.  Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.

[24]  Hasan Selim,et al.  Determinants of house prices in Turkey: Hedonic regression versus artificial neural network , 2009, Expert Syst. Appl..

[25]  Sarabjot Singh Anand,et al.  The application of intelligent hybrid techniques for the mass appraisal of residential properties , 1999 .

[26]  Okmyung Bin A prediction comparison of housing sales prices by parametric versus semi-parametric regressions , 2004 .

[27]  Manya M. Mooya,et al.  Of Mice and Men , 2011 .

[28]  Evgeny A. Antipov,et al.  Applying a CART-based Approach for the Diagnostics of Mass Appraisal Models , 2010 .

[29]  Beatriz Larraz,et al.  An online real estate valuation model for control risk taking: A spatial approach , 2013 .

[30]  R. Dubin,et al.  Predicting House Prices Using Multiple Listings Data , 1998 .

[31]  Esteban Alfaro Cortés,et al.  ANN+GIS: An automated system for property valuation , 2008, Neurocomputing.

[32]  Shujiao Huang,et al.  A New Class of Generalized Modified Weibull Distribution with Applications , 2015 .

[33]  Seow Eng Ong,et al.  Determinants of House Price: A Decision Tree Approach , 2006 .

[34]  Muhittin Oral,et al.  Designing and implementing a real estate appraisal system: The case of Québec Province, Canada , 2015 .

[35]  Antanas Verikas,et al.  The mass appraisal of the real estate by computational intelligence , 2011, Appl. Soft Comput..

[36]  BaeJae Kwon,et al.  Using machine learning algorithms for housing price prediction , 2015 .