Location-Centered House Price Prediction: A Multi-Task Learning Approach

Accurate house prediction is of great significance to various real estate stakeholders such as house owners, buyers, investors, and agents. We propose a location-centered prediction framework that differs from existing work in terms of data profiling and prediction model. Regarding data profiling, we define and capture a fine-grained location profile powered by a diverse range of location data sources, such as transportation profile (e.g., distance to nearest train station), education profile (e.g., school zones and ranking), suburb profile based on census data, facility profile (e.g., nearby hospitals, supermarkets). Regarding the choice of prediction model, we observe that a variety of approaches either consider the entire house data for modeling, or split the entire data and model each partition independently. However, such modeling ignores the relatedness between partitions, and for all prediction scenarios, there may not be sufficient training samples per partition for the latter approach. We address this problem by conducting a careful study of exploiting the Multi-Task Learning (MTL) model. Specifically, we map the strategies for splitting the entire house data to the ways the tasks are defined in MTL, and each partition obtained is aligned with a task. Furthermore, we select specific MTL-based methods with different regularization terms to capture and exploit the relatedness between tasks. Based on real-world house transaction data collected in Melbourne, Australia. We design extensive experimental evaluations, and the results indicate a significant superiority of MTL-based methods over state-of-the-art approaches. Meanwhile, we conduct an in-depth analysis on the impact of task definitions and method selections in MTL on the prediction performance, and demonstrate that the impact of task definitions on prediction performance far exceeds that of method selections.

[1]  S. Law,et al.  Take a Look Around: Using Street View and Satellite Images to Estimate House Prices , 2019 .

[2]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[3]  S. Rosen Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition , 1974, Journal of Political Economy.

[4]  Seow Eng Ong,et al.  Determinants of House Price: A Decision Tree Approach , 2006 .

[5]  D.P. Solomatine,et al.  AdaBoost.RT: a boosting algorithm for regression problems , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[6]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[7]  Mats Wilhelmsson,et al.  Analysing location attributes with a hedonic model for apartment prices in Donetsk, Ukraine , 2007 .

[8]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[9]  Wei Xiao,et al.  Prognosis and Diagnosis of Parkinson's Disease Using Multi-Task Learning , 2017, KDD.

[10]  K. Lancaster,et al.  A New Approach to Consumer Theory , 1966, Journal of Political Economy.

[11]  Jieping Ye,et al.  Multi-Task Learning for Spatio-Temporal Event Forecasting , 2015, KDD.

[12]  D. Basak,et al.  Support Vector Regression , 2008 .

[13]  Rainer Schulz,et al.  A State Space Model for Berlin House Prices: Estimation and Economic Interpretation , 2003 .

[14]  Yihao Zhang,et al.  Real estate price forecasting based on SVM optimized by PSO , 2014 .

[15]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[16]  Marco Helbich,et al.  Geostatistical mapping of real estate prices: an empirical comparison of kriging and cokriging , 2014, Int. J. Geogr. Inf. Sci..

[17]  Osman Aytekin,et al.  The use of fuzzy logic in predicting house selling price , 2010, Expert Syst. Appl..

[18]  Anna Król Application of Hedonic Methods in Modelling Real Estate Prices in Poland , 2013, ECDA.

[19]  Hanan Samet,et al.  ConcaveCubes: Supporting Cluster‐based Geographical Visualization in Large Data Scale , 2018, Comput. Graph. Forum.

[20]  Bradford Case,et al.  Modeling Spatial and Temporal House Price Patterns: A Comparison of Four Models , 2004 .

[21]  José-María Montero-Lorenzo,et al.  Housing price prediction: parametric versus semi-parametric spatial hedonic models , 2018, J. Geogr. Syst..

[22]  Rui Zhang,et al.  HomeSeeker: A visual analytics system of real estate data , 2018, J. Vis. Lang. Comput..

[23]  John F. Kain,et al.  Measuring the Value of Housing Quality , 1970 .

[24]  Timos Sellis,et al.  Boosting house price predictions using geo-spatial network embedding , 2020, Data Mining and Knowledge Discovery.

[25]  Aysegul Can Specification and estimation of hedonic housing price models , 1992 .

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Rosalind W. Picard,et al.  Multi-task Learning for Predicting Health , Stress , and Happiness , 2016 .

[28]  Hasan Selim,et al.  Determinants of house prices in Turkey: Hedonic regression versus artificial neural network , 2009, Expert Syst. Appl..

[29]  Jim Berry,et al.  Hedonic modelling, housing submarkets and residential valuation , 1996 .

[30]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[31]  Shuzlina Abdul Rahman,et al.  House Price Prediction using a Machine Learning Model: A Survey of Literature , 2020, International Journal of Modern Education and Computer Science.

[32]  Ping-Feng Pai,et al.  Using Machine Learning Models and Actual Transaction Data for Predicting Real Estate Prices , 2020, Applied Sciences.

[33]  Yongfeng Ju,et al.  Research on Accurate House Price Analysis by Using GIS Technology and Transport Accessibility: A Case Study of Xi'an, China , 2020, Symmetry.

[34]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[35]  T. Chai,et al.  Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature , 2014 .

[36]  Mingcang Zhu,et al.  Housing price forecasting based on genetic algorithm and support vector machine , 2011, Expert Syst. Appl..

[37]  Boris Chidlovskii,et al.  Multi-task learning of time series and its application to the travel demand , 2017, ArXiv.

[38]  Rüştü Yayar,et al.  HEDONIC ESTIMATION OF HOUSING MARKET PRICES IN TURKEY , 2014 .

[39]  Ibrahim Halil Gerek,et al.  House selling price assessment using two different adaptive neuro-fuzzy techniques , 2014 .

[40]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[41]  Zhifeng Bao,et al.  Visualization-Aided Exploration of the Real Estate Data , 2016, ADC.

[42]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[43]  Jae Kwon Bae,et al.  Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data , 2015, Expert Syst. Appl..

[44]  John R. Ottensmann,et al.  Urban Location and Housing Prices within a Hedonic Model , 2008 .

[45]  E. Cantoni,et al.  Predicting House Prices with Spatial Dependence: A Comparison of Alternative Methods , 2010 .

[46]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[47]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[48]  Yoshua Bengio,et al.  Multi-Task Learning for Stock Selection , 1996, NIPS.

[49]  Yu Zheng,et al.  Urban Water Quality Prediction Based on Multi-Task Multi-View Learning , 2016, IJCAI.

[50]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[51]  Daqing Zhang,et al.  semi-Traj2Graph Identifying Fine-Grained Driving Style With GPS Trajectory Data via Multi-Task Learning , 2021, IEEE Transactions on Big Data.

[52]  H. Akinci,et al.  The use of hedonic pricing method to determine the parameters affecting residential real estate prices , 2017, Arabian Journal of Geosciences.

[53]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .