Early prediction of the performance of green building projects using pre-project planning variables: data mining approaches

Abstract Early prediction of the success of green building projects is an important and challenging issue. The aim of this study was to develop a model to predict the cost and schedule performance of green building projects based on the level of definition during the pre-project planning phase. To this end, a three-step process was proposed: pre-processing, variable selection, and prediction model construction. Data from 53 certified green buildings were used to develop the models. After balancing the data set with respect to the proportion of cases in each of the outcome categories by pre-processing, the number of input variables was reduced from 64 to 13 and 7 for cost and schedule performance prediction respectively, using the ReliefF-W variable selection method. Then, cost and schedule performance prediction models were constructed using the selected variables and four different classifiers: a support vector machine (SVM), a back-propagation neural network (BPNN), a C4.5 decision tree algorithm (C4.5), and a logistic regression (LR). The classification performance of the four models was compared to assess their applicability. The SVM models exhibited the highest accuracy, sensitivity, and specificity in predicting both the cost and schedule performance of green building projects. The results of this study empirically validated that the cost and schedule performance of green building projects is highly dependent on the quality of definition in the pre-project planning phase.

[1]  Vivian W. Y Tam,et al.  Project feasibility study: the key to successful implementation of sustainable and socially responsible construction management practice , 2010 .

[2]  Lei Zhang,et al.  Life cycle assessment of the air emissions during building construction process: A case study in Hong Kong , 2013 .

[3]  Huan Liu,et al.  Some issues on scalable feature selection , 1998 .

[4]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[5]  Changmin Kim,et al.  Hybrid principal component analysis and support vector machine model for predicting the cost performance of commercial building projects using pre-project planning variables , 2012 .

[6]  Bon-Gang Hwang,et al.  Comparison of schedule delay and causal factors between traditional and green construction projects , 2013 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  David R. Riley,et al.  Constructability Practices to Manage Sustainable Building Knowledge , 2006 .

[9]  Guo-en Xia,et al.  Model of Customer Churn Prediction on Support Vector Machine , 2008 .

[10]  Bing-Zhao Li,et al.  Sampling in the Linear Canonical Transform Domain , 2012 .

[11]  Yuquan Wei,et al.  Prediction models of human plasma protein binding rate and oral bioavailability derived by using GA-CG-SVM method. , 2008, Journal of pharmaceutical and biomedical analysis.

[12]  V. Vapnik The Support Vector Method of Function Estimation , 1998 .

[13]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[14]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[15]  Xiaoling Zhang,et al.  Green strategy for gaining competitive advantage in housing development: a China study , 2011 .

[16]  Bo-Suk Yang,et al.  Application of relevance vector machine and logistic regression for machine degradation assessment , 2010 .

[17]  Sheng Chen,et al.  A combined SMOTE and PSO based RBF classifier for two-class imbalanced problems , 2011, Neurocomputing.

[18]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[19]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[20]  Sung-Nien Yu,et al.  Selection of effective features for ECG beat recognition based on nonlinear correlations , 2012, Artif. Intell. Medicine.

[21]  Kate Smith-Miles,et al.  On learning algorithm selection for classification , 2006, Appl. Soft Comput..

[22]  David L. Olson,et al.  Comparative analysis of data mining methods for bankruptcy prediction , 2012, Decis. Support Syst..

[23]  Mehmet Fatih Akay,et al.  Support vector machines combined with feature selection for breast cancer diagnosis , 2009, Expert Syst. Appl..

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[25]  Ching-Hsue Cheng,et al.  OWA rough set model for forecasting the revenues growth rate of the electronic industry , 2010, Expert Syst. Appl..

[26]  R. Grebe,et al.  Automated neonatal seizure detection: A multistage classification system through feature selection based on relevance and redundancy analysis , 2006, Clinical Neurophysiology.

[27]  Lucila Ohno-Machado,et al.  A Comparison of Machine Learning Methods for the Diagnosis of Pigmented Skin Lesions , 2001, J. Biomed. Informatics.

[28]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[29]  Daniel S. Yeung,et al.  Feature selection using localized generalization error for supervised classification problems using RBFNN , 2008, Pattern Recognit..

[30]  Jennifer Atlee,et al.  Selecting safer building products in practice , 2011 .

[31]  B. Irie,et al.  Capabilities of three-layered perceptrons , 1988, IEEE 1988 International Conference on Neural Networks.

[32]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[33]  Der-Chiang Li,et al.  A learning method for the class imbalance problem with medical data sets , 2010, Comput. Biol. Medicine.

[34]  David G. Stork,et al.  Pattern Classification , 1973 .

[35]  Young-Chan Lee,et al.  Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters , 2005, Expert Syst. Appl..

[36]  B. Pradhan,et al.  Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines, Decision Tree, and Naïve Bayes Models , 2012 .

[37]  Xiaoling Zhang,et al.  Green property development practice in China: Costs and barriers , 2011 .

[38]  Yu-Ren Wang,et al.  Predicting construction cost and schedule success using artificial neural networks ensemble and support vector machines classification models , 2012 .

[39]  Xiaoling Zhang,et al.  Paradigm shift toward sustainable commercial project development in China , 2014 .

[40]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[41]  Senjian An,et al.  Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression , 2007, Pattern Recognit..

[42]  Bernard Zenko,et al.  Estimating the risk of fire outbreaks in the natural environment , 2012, Data Mining and Knowledge Discovery.

[43]  G. Edward Gibson,et al.  Building Project Scope Definition Using Project Definition Rating Index , 2001 .

[44]  Dirk Van den Poel,et al.  Model-supported business-to-business prospect prediction based on an iterative customer acquisition framework , 2013 .

[45]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[46]  P. Geethanjali,et al.  Identification of motion from multi-channel EMG signals for control of prosthetic hand , 2011, Australasian Physical & Engineering Sciences in Medicine.

[47]  Vittal S. Anantatmula,et al.  Greening Project Management Practices for Sustainable Construction , 2011 .

[48]  David West,et al.  Neural network credit scoring models , 2000, Comput. Oper. Res..

[49]  Arun Chandramohan,et al.  Cost and time overrun analysis for green construction projects , 2012 .

[50]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[51]  María José del Jesús,et al.  Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets , 2009, Int. J. Approx. Reason..

[52]  Taghi M. Khoshgoftaar,et al.  Threshold-based feature selection techniques for high-dimensional bioinformatics data , 2012, Network Modeling Analysis in Health Informatics and Bioinformatics.

[53]  David R. Riley,et al.  Project Delivery Metrics for Sustainable, High-Performance Buildings , 2011 .

[54]  Yongsheng Ding,et al.  Forecasting financial condition of Chinese listed companies based on support vector machine , 2008, Expert Syst. Appl..

[55]  Yu-Ren Wang,et al.  A study of preproject planning and project success using ANNs and regression models , 2008 .

[56]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[57]  Pedro Isasi Viñuela,et al.  Early bankruptcy prediction using ENPC , 2008, Applied Intelligence.

[58]  Yu-ren Wang Applying the PDRI in project risk management , 2002 .

[59]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[60]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .