Machine learning predictive model based on national data for fatal accidents of construction workers

Abstract The purpose of this study is to develop a prediction model that identifies the potential risk of fatality accidents at construction sites using machine learning based on industrial accident data collected by the Ministry of Employment and Labor (MOEL) of the Republic of Korea from 2011 to 2016. The data details 137,323 injuries and 2846 deaths, and includes age, sex, and length of service of each accident victim, as well as the type of construction, employer scale, and date of the accident. Upon describing the distribution of the dataset, machine learning methods, such as logistic regression, decision tree, random forest, and AdaBoost analyses were applied with the derivation of major variables influencing classification in each algorithm. A comparison of the performance of each model showed the area under the receiver operating characteristic (AUROC) curve to be highest for the random forest method, at 0.9198, which translates to a 91.98% successful predictive rate in terms of classifying workers who could face a high fatality risk. The random forest analysis of this study indicates that the month (season) and employment size are the most influential factors, followed by age, weekday, and service length based on mean decrease Gini values to predict the likelihood of a fatality accident. Moreover, this analysis generated ensemble predictions based on all the factors contained in the dataset. Hence, this study demonstrates the feasibility of machine learning in the construction safety management area. The results obtained can contribute to the prevention of accidents by raising awareness of potential safety risks, by quantitatively predicting fatal accidents and incorporating the findings with a manpower control system at a construction site.

[1]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[2]  A. Garg,et al.  Ergonomics and the older worker: an overview. , 2007, Experimental aging research.

[3]  Guohong Jiang,et al.  Leading causes of death from injury and poisoning by age, sex and urban/rural areas in Tianjin, China 1999-2006. , 2011, Injury.

[4]  Graham Greenleaf,et al.  Global Data Privacy Laws: 89 Countries, and Accelerating , 2012 .

[5]  Young Hoon Lee,et al.  A STUDY ON PREDICTION MODELING OF KOREA MILLITARY AIRCRAFT ACCIDENT OCCURRENCE , 2016 .

[6]  Murat Karacasu,et al.  Estimating the causes of traffic accidents using logistic regression and discriminant analysis , 2014, International journal of injury control and safety promotion.

[7]  S. Gerassis,et al.  Bayesian Decision Tool for the Analysis of Occupational Accidents in the Construction of Embankments , 2017 .

[8]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[9]  Tetsuya Nishimoto,et al.  Serious injury prediction algorithm based on large-scale data and under-triage control. , 2017, Accident; analysis and prevention.

[10]  George Yannis,et al.  Estimation of Fatality and Injury Risk by Means of In-Depth Fatal Accident Investigation Data , 2010, Traffic injury prevention.

[11]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[12]  Chia-Fen Chi,et al.  Accident patterns and prevention measures for fatal occupational falls in the construction industry. , 2005, Applied ergonomics.

[13]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[14]  Yang Miang Goh,et al.  Construction accident narrative classification: An evaluation of text mining techniques. , 2017, Accident; analysis and prevention.

[15]  Francis K. W. Wong,et al.  Fatal Construction Accidents in Hong Kong , 2018 .

[16]  Kwok-wing Chau,et al.  Developing an ANFIS-based swarm concept model for estimating the relative viscosity of nanofluids , 2018, Engineering Applications of Computational Fluid Mechanics.

[17]  H. W. Heinrich,et al.  Industrial accident prevention : a safety management approach , 1980 .

[18]  Kyriacos C. Mouskos,et al.  Black spots identification through a Bayesian Networks quantification of accident risk index , 2013 .

[19]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[20]  J Ormel,et al.  The Distribution of Psychiatric and Somatic Ill Health: Associations With Personality and Socioeconomic Status , 2001, Psychosomatic medicine.

[21]  Mohamed Abdel-Aty,et al.  Characteristics of rear-end accidents at signalized intersections using multiple logistic regression model. , 2005, Accident; analysis and prevention.

[22]  George W Rebok,et al.  Age, flight experience, and risk of crash involvement in a cohort of professional pilots. , 2003, American journal of epidemiology.

[23]  C W Runyan,et al.  Fatal occupational injuries in a southern state. , 1997, American journal of epidemiology.

[24]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[25]  Shahaboddin Shamshirband,et al.  Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran , 2018 .

[26]  Chuntian Cheng,et al.  Three-person multi-objective conflict decision in reservoir flood control , 2002, Eur. J. Oper. Res..

[27]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[28]  Sultan Noman Qasem,et al.  Daily global solar radiation modeling using data-driven techniques and empirical equations in a semi-arid climate , 2019, Engineering Applications of Computational Fluid Mechanics.

[29]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Vitaly S. Guzhva,et al.  Impact of gender, age and experience of pilots on general aviation accidents. , 2011, Accident; analysis and prevention.

[33]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Yong K. Cho,et al.  Data-Driven Monitoring System for Preventing the Collapse of Scaffolding Structures , 2018, Journal of Construction Engineering and Management.

[35]  Matthew R. Hallowell,et al.  Application of machine learning to construction injury prediction , 2016 .

[36]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[37]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[38]  Sou-Sen Leu,et al.  Characteristic analysis of occupational accidents at small construction enterprises , 2010 .

[39]  Seyed Bagher Mortazavi,et al.  Assessment of accident severity in the construction industry using the Bayesian theorem , 2015, International journal of occupational safety and ergonomics : JOSE.

[40]  Samy Bengio,et al.  Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[41]  Jongko Choi,et al.  Development of prediction model of construction workers accident occurrence through machine learning , 2018 .

[42]  Miroslaw J. Skibniewski,et al.  Perceiving Interactions on Construction Safety Behaviors: Workers’ Perspective , 2016 .

[43]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[44]  Zaher Mundher Yaseen,et al.  An enhanced extreme learning machine model for river flow forecasting: State-of-the-art, practical applications in water resource engineering area and future research direction , 2019, Journal of Hydrology.

[45]  Abdollah Ardeshir,et al.  Pattern extraction for high-risk accidents in the construction industry: a data-mining approach , 2016, International journal of injury control and safety promotion.

[46]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[47]  Liping Fu,et al.  Injury severity analysis: comparison of multilevel logistic regression models and effects of collision data aggregation , 2016 .

[48]  G. Matthews,et al.  Ageing and work. , 1991 .

[49]  C. L. Wu,et al.  Rainfall–runoff modeling using artificial neural network coupled with singular spectrum analysis , 2011 .