Safety leading indicators for construction sites: A machine learning approach

Abstract The construction industry is one of the most dangerous industries in many countries. To improve the situation, senior managers overseeing portfolios of construction projects need to understand the safety risk levels of their projects so that interventions can be implemented proactively. Safety leading indicators is one way to flag sites that are of higher risk. However, there is a lack of validated leading indicators that can reliably classify sites according to their safety risk levels. On the other hand, despite the success of machine learning (ML) approaches in other domains, it is not widely utilized in the construction industry, especially in the development of safety leading indicators. This paper presents a ML approach to developing leading indicators that classify sites in accordance to their safety risk in construction projects. This study was guided by the industry-recognized Cross-Industry Standard Process for Data Mining (CRISP-DM) framework and the key types of data used include safety inspection records, accident cases and project-related data. These data were obtained from a large contractor in Singapore and the data were accumulated from year 2010 to 2016. Out of thirty-three input variables (also known as features or independent variables), 13 input variables were selected using a combination of Boruta feature selection technique and decision tree. Of the 13 selected input variables, six of them are project-related (project type, project ownership, contract sum, percent completed, magnitude of delay and project manpower) and seven of them are items in the contractor's safety inspection checklists (crane/lifting operations, scaffold, mechanical-elevated working platform, falling hazards/openings, environmental management, good practices and weighted safety inspection score). Five popular ML algorithms were then used to train models for prediction of accident occurrence and severity. During validation, random forest (RF) provided the best prediction performance with an accuracy of 0.78 and has achieved a substantial strength of agreement with Weighted-Kappa Statistics of 0.70. Comparing with similar studies, this result is promising. The prediction (i.e. the output variable) provided by the RF model can be used as a safety leading indicator of the risk level of a site. It is recommended that the predictive RF model be deployed in construction organizations, especially large public and private developers, contractors and industry associations, to provide monthly forecast of project safety performance so that pre-emptive inspections and interventions can be implemented in a more targeted manner.

[1]  Max Kuhn,et al.  Measuring Performance in Classification Models , 2013 .

[2]  Kung-Min Wang,et al.  An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data , 2013, BMC Medical Informatics and Decision Making.

[3]  Yang Miang Goh,et al.  Cognitive Factors Influencing Safety Behavior at Height: A Multimethod Exploratory Study , 2015 .

[4]  Fu Xiao,et al.  Data mining in building automation system for improving building operational performance , 2014 .

[5]  Changmin Kim,et al.  Hybrid principal component analysis and support vector machine model for predicting the cost performance of commercial building projects using pre-project planning variables , 2012 .

[6]  Wen-der Yu Closure of "Hybrid Soft Computing Approach for Mining of Complex Construction Databases" , 2007 .

[7]  Jimmie Hinze,et al.  Proactive Construction Safety Control: Measuring, Monitoring, and Responding to Safety Leading Indicators , 2013 .

[8]  Franco K.T. Cheung,et al.  Application of cross validation techniques for modelling construction costs during the very early design stage , 2006 .

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Ben J. M. Ale More thinking about process safety indicators , 2009 .

[11]  Lucio Soibelman,et al.  Factor selection for delay analysis using Knowledge Discovery in Databases , 2008 .

[12]  Miroslaw J. Skibniewski,et al.  Integrating Neurofuzzy System with Conceptual Cost Estimation to Discover Cost-Related Knowledge from Residential Construction Projects , 2010, J. Comput. Civ. Eng..

[13]  Jingfeng Yuan,et al.  Developing dimensions and key indicators for the safety climate within China’s construction teams: A questionnaire survey on construction sites in Nanjing , 2017 .

[14]  Alarcos Cieza,et al.  Does the Comprehensive International Classification of Functioning, Disability and Health (ICF) Core Set for Breast Cancer capture the problems in functioning treated by physiotherapists in women with breast cancer? , 2011, Physiotherapy.

[15]  Saeed Givehchi,et al.  Association between safety leading indicators and safety climate levels. , 2017, Journal of safety research.

[16]  Wen-der Yu,et al.  Hybridization of CBR and numeric soft computing techniques for mining of scarce construction databases , 2006 .

[17]  Igor Kononenko Chapter 3 – Machine Learning Basics , 2007 .

[18]  W. Art Chaovalitwongse,et al.  Data Mining Framework to Optimize the Bid Selection Policy for Competitively Bid Highway Construction Projects , 2012 .

[19]  Dov Zohar,et al.  100 Years of Occupational Safety Research: From Basic Protections and Work Analysis to a Multilevel View of Workplace Safety and Risk , 2017, The Journal of applied psychology.

[20]  Tak Wing Yiu,et al.  Developing Leading Indicators to Monitor the Safety Conditions of Construction Projects , 2016 .

[21]  Matthew R. Hallowell,et al.  Application of machine learning to construction injury prediction , 2016 .

[22]  Amir H. Behzadan,et al.  Knowledge-Based Simulation Modeling of Construction Fleet Operations Using Multimodal-Process Data Mining , 2013 .

[23]  Sou-Sen Leu,et al.  Use of association rules to explore cause-effect relationships in occupational accidents in the Taiwan construction industry , 2010 .

[24]  Andrew Hale,et al.  Why safety performance indicators , 2009 .

[25]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[26]  Yang Li,et al.  Predicting profitability of listed construction companies based on principal component analysis and support vector machine—Evidence from China , 2015 .

[27]  Elske Ammenwerth,et al.  Implementation of the Austrian Nursing Minimum Data Set (NMDS-AT): A Feasibility Study , 2015, BMC Medical Informatics and Decision Making.

[28]  Min-Yuan Cheng,et al.  Hybrid intelligence approach based on LS-SVM and Differential Evolution for construction cost index estimation: A Taiwan case study , 2013 .

[29]  Yang Miang Goh,et al.  Poisson Model of Construction Incident Occurrence , 2005 .

[30]  Igor Kononenko Chapter 7 – Data Preprocessing , 2007 .

[31]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[32]  Matthew R. Hallowell,et al.  Leading or lagging? Temporal analysis of safety indicators on a large infrastructure construction project , 2017 .

[33]  Sou-Sen Leu,et al.  Applying data mining techniques to explore factors contributing to occupational injuries in Taiwan's construction industry. , 2012, Accident; analysis and prevention.

[34]  Tak Wing Yiu,et al.  Using a Pressure-State-Practice Model to Develop Safety Leading Indicators for Construction Projects , 2017 .

[35]  Kathleen M. Sutcliffe,et al.  Doing No Harm: Enabling, Enacting, and Elaborating a Culture of Safety in Health Care , 2010 .

[36]  Min-Yuan Cheng,et al.  Web-based conceptual cost estimates for construction projects using Evolutionary Fuzzy Neural Inference Model , 2009 .

[37]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[38]  Lucio Soibelman,et al.  Data Preparation Process for Construction Knowledge Generation through Knowledge Discovery in Databases , 2002 .

[39]  Matthew R. Hallowell,et al.  Predictive Validity of Safety Leading Indicators: Empirical Assessment in the Oil and Gas Sector , 2016 .

[40]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[41]  Chia-Wen Liao,et al.  Data mining for occupational injuries in the Taiwan construction industry , 2008 .

[42]  Peter E.D. Love,et al.  Statistical analysis of injury and nonconformance frequencies in construction: negative binomial regression model , 2017 .

[43]  Jui-Sheng Chou,et al.  Predicting Disputes in Public-Private Partnership Projects: Classification and Ensemble Models , 2013, J. Comput. Civ. Eng..

[44]  D. Zohar,et al.  Transformational leadership and group interaction as climate antecedents: a social network analysis. , 2008, The Journal of applied psychology.

[45]  Ingrid Bouwer Utne,et al.  Building Safety indicators: Part 2 - Application, practices and results , 2011 .

[46]  Wen-der Yu,et al.  A VaFALCON neuro-fuzzy system for mining of incomplete construction databases , 2006 .

[47]  Teemu Reiman,et al.  Leading indicators of system safety – Monitoring and driving the organizational safety potential , 2012 .

[48]  Simaan M. AbouRizk,et al.  Assessing Residual Value of Heavy Construction Equipment Using Predictive Data Mining Model , 2008 .

[49]  Xia Hong,et al.  Construction of Neurofuzzy Models For Imbalanced Data Classification , 2014, IEEE Transactions on Fuzzy Systems.

[50]  Jie Gong,et al.  Predicting construction cost overruns using text mining, numerical data and ensemble classifiers , 2014 .

[51]  Ingrid Bouwer Utne,et al.  Building Safety indicators: Part 1 – Theoretical foundation , 2011 .

[52]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[53]  Dov Zohar,et al.  Thirty years of safety climate research: reflections and future directions. , 2010, Accident; analysis and prevention.

[54]  Yang Miang Goh,et al.  Neural network analysis of construction safety management systems: a case study in Singapore , 2013 .

[55]  Homayoun Najjaran,et al.  Exploring the Relationship between Soil Properties and Deterioration of Metallic Pipes Using Predictive Data Mining Methods , 2010, J. Comput. Civ. Eng..

[56]  J. Fleiss,et al.  Quantification of agreement in psychiatric diagnosis. A new approach. , 1967, Archives of general psychiatry.

[57]  James T. Reason,et al.  Managing the risks of organizational accidents , 1997 .

[58]  Zhipeng Zhou,et al.  Overview and analysis of safety management studies in the construction industry , 2015 .

[59]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[60]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[61]  Miroslaw J. Skibniewski Research Trends in Information Technology Applications in Construction Safety Engineering and Management , 2014 .

[62]  Dwayne Van Eerd,et al.  Developing leading indicators from OHS management audit data: Determining the measurement properties of audit data from the field. , 2017, Journal of safety research.

[63]  Yasser Abdel-Rady I. Mohamed,et al.  Application of KDD Techniques to Extract Useful Knowledge from Labor Resources Data in Industrial Construction Projects , 2014 .

[64]  Wen-der Yu,et al.  A WICE approach to real-time construction cost estimation , 2006 .

[65]  Seongkyu Yoon,et al.  Decision support in machine vision system for monitoring of TFT-LCD glass substrates manufacturing , 2014 .

[66]  Yueng-Hsiang Huang,et al.  A mediation model linking dispatcher leadership and work ownership with safety climate as predictors of truck driver safety performance. , 2014, Accident; analysis and prevention.