Predicting construction cost using multiple regression techniques

This paper describes the development of linear regression models to predict the construction cost of buildings, based on 286 sets of data collected in the United Kingdom. Raw cost is rejected as a suitable dependent variable and models are developed for cost∕ m2 , log of cost, and log of cost∕ m2 . Both forward and backward stepwise analyses were performed, giving a total of six models. Forty-one potential independent variables were identified. Five variables appeared in each of the six models: gross internal floor area (GIFA), function, duration, mechanical installations, and piling, suggesting that they are the key linear cost drivers in the data. The best regression model is the log of cost backward model which gives an R2 of 0.661 and a mean absolute percentage error (MAPE) of 19.3%; these results compare favorably with past research which has shown that traditional methods of cost estimation have values of MAPE typically in the order of 25%.