Identifying low-quality patterns in accident reports from textual data

Accident investigation reports provide useful knowledge to support companies to propose preventive and mitigative measures. However, the information presented in accident report databases is normally large, complex, filled with errors and has missing and/or redundant data. In this article, we propose text mining and natural language processing techniques to investigate low-quality accident reports. We adopted machine learning (ML) to detect and investigate inconsistencies on accident reports. The methodology was applied to 626 documents collected from an actual hydroelectric power company. The initial ML performances indicated data divergences and concerns related to the report structure. Then, the accident database was restructured to a more proper form confirming the supposition about the quality of the reports investigated. The proposed approach can be used as a diagnostic tool to improve the design of accident investigation reports to provide a more useful source of knowledge to support decisions in the safety context.

[1]  I. Lins,et al.  Automatic drowsiness detection for safety-critical operations using ensemble models and EEG signals , 2022, Process Safety and Environmental Protection.

[2]  E. Zio,et al.  Machine learning-based models to prioritize scenarios in a Quantitative Risk Analysis: An application to an actual atmospheric distillation unit , 2022, Journal of Loss Prevention in the Process Industries.

[3]  I. Lins,et al.  Identification of risk features using text mining and BERT-based models: Application to an oil refinery , 2021, Process Safety and Environmental Protection.

[4]  Guann-Pyng Li,et al.  Internet of Things and occupational well-being in industry 4.0: A systematic mapping study and taxonomy , 2021, Comput. Ind. Eng..

[5]  Yongyoon Suh Sectoral patterns of accident process for occupational safety using narrative texts of OSHA database , 2021 .

[6]  Rui Melício,et al.  Machine Learning and Natural Language Processing for Prediction of Human Factors in Aviation Incident Reports , 2021, Aerospace.

[7]  Caio Bezerra Souto Maior,et al.  Real-time classification for autonomous drowsiness detection using eye aspect ratio , 2020, Expert Syst. Appl..

[8]  Yongsheng Ma,et al.  Using machine learning and keyword analysis to analyze incidents and reduce risk in oil sands operations , 2020 .

[9]  Youngjung Geum,et al.  Automated classification of patents: A topic modeling approach , 2020, Comput. Ind. Eng..

[10]  Johannes I. Single,et al.  Knowledge acquisition from chemical accident databases using an ontology-based method and natural language processing , 2020 .

[11]  Kyoung-Bok Min,et al.  Topic Modeling of Social Networking Service Data on Occupational Accidents in Korea: Latent Dirichlet Allocation Analysis , 2020, Journal of medical Internet research.

[12]  Seyed Shamseddin Alizadeh,et al.  Investigating the status of accident precursor management in East Azarbaijan Province Gas Company , 2020, International journal of occupational safety and ergonomics : JOSE.

[13]  Xing Pan,et al.  Assessing the reliability of electronic products using customer knowledge discovery , 2020, Reliab. Eng. Syst. Saf..

[14]  Mark R. Lehto,et al.  Intelligent human-machine approaches for assigning groups of injury codes to accident narratives , 2020, Safety Science.

[15]  Likai Liang,et al.  Mapping the Academic Landscape of the Renewable Energy Field in Electrical and Electronic Disciplines , 2020, Applied Sciences.

[16]  Francisco Herrera,et al.  Predicting literature's early impact with sentiment analysis in Twitter , 2020, Knowl. Based Syst..

[17]  Christopher M. Jones,et al.  Advancing injury and violence prevention through data science. , 2020, Journal of safety research.

[18]  Joseph K. Muguro,et al.  Trend analysis and fatality causes in Kenyan roads: A review of road traffic accident data between 2015 and 2020 , 2020 .

[19]  Zoie Shui-Yee Wong,et al.  Medication-rights detection using incident reports: A natural language processing and deep neural network approach , 2019, Health Informatics J..

[20]  Giuseppe Parise,et al.  Risk Profiling from the European Statistics on Accidents at Work (ESAW) Accidents′ Databases: A Case Study in Construction Sites , 2019, International journal of environmental research and public health.

[21]  Saturnino Luz,et al.  A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis , 2019, Int. J. Medical Informatics.

[22]  Caio Bezerra Souto Maior,et al.  Particle swarm-optimized support vector machines and pre-processing techniques for remaining useful life estimation of bearings , 2019, Eksploatacja i Niezawodnosc - Maintenance and Reliability.

[23]  Matthew R. Hallowell,et al.  Automatically Learning Construction Injury Precursors from Text , 2019, Automation in Construction.

[24]  Jhareswar Maiti,et al.  Application of optimized machine learning techniques for prediction of occupational accidents , 2019, Comput. Oper. Res..

[25]  Hasan Fleyeh,et al.  Construction site accident analysis using text mining and natural language processing techniques , 2019, Automation in Construction.

[26]  Jhareswar Maiti,et al.  Decision support system for safety improvement: An approach using multiple correspondence analysis, t-SNE algorithm and K-means clustering , 2019, Comput. Ind. Eng..

[27]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[28]  Miguel Figueres-Esteban,et al.  From free-text to structured safety management: Introduction of a semi-automated classification method of railway hazard reports to elements on a bow-tie diagram , 2018, Safety Science.

[29]  M. Punniyamoorthy,et al.  Scaling feature selection method for enhancing the classification performance of Support Vector Machines in text mining , 2018, Comput. Ind. Eng..

[30]  Hana Lee,et al.  Engineering doc2vec for automatic classification of product descriptions on O2O applications , 2018, Electron. Commer. Res..

[31]  Roberto Boselli,et al.  Classifying online Job Advertisements through Machine Learning , 2018, Future Gener. Comput. Syst..

[32]  V. Vapnik,et al.  Rethinking statistical learning theory: learning using statistical invariants , 2018, Machine Learning.

[33]  Enrique López Droguett,et al.  Personal protective equipment detection in industrial facilities using camera video streaming , 2018, Safety and Reliability – Safe Societies in a Changing World.

[34]  Giovanni Maria Farinella,et al.  On-board monitoring system for road traffic safety analysis , 2018, Comput. Ind..

[35]  Luc Mathieu,et al.  Design and application of a tool for structuring, capitalizing and making more accessible information and lessons learned from accidents involving machinery , 2017, International journal of occupational safety and ergonomics : JOSE.

[36]  Sankaran Mahadevan,et al.  Reliability analysis with linguistic data: An evidential network approach , 2017, Reliab. Eng. Syst. Saf..

[37]  Vladik Kreinovich,et al.  A simple probabilistic explanation of term frequency-inverse document frequency (tf-idf) heuristic (and variations motivated by this explanation) , 2017, Int. J. Gen. Syst..

[38]  Matthew R. Hallowell,et al.  Application of machine learning to construction injury prediction , 2016 .

[39]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[40]  Leandro Chaves Rêgo,et al.  Estimation of expected number of accidents and workforce unavailability through Bayesian population variability analysis and Markov-based model , 2016, Reliab. Eng. Syst. Saf..

[41]  Mark R Lehto,et al.  Bayesian decision support for coding occupational injury data. , 2016, Journal of safety research.

[42]  S Leclercq,et al.  Extracting recurrent scenarios from narrative texts using a Bayesian network: application to serious occupational accidents with movement disturbance. , 2014, Accident; analysis and prevention.

[43]  Waldemar Karwowski,et al.  The Identification of Factors Contributing to Self-Reported Anomalies in Civil Aviation , 2014, International journal of occupational safety and ergonomics : JOSE.

[44]  Joaquim F. Silva,et al.  Finding occupational accident patterns in the extractive industry using a systematic data mining approach , 2012, Reliab. Eng. Syst. Saf..

[45]  Sou-Sen Leu,et al.  Applying data mining techniques to explore factors contributing to occupational injuries in Taiwan's construction industry. , 2012, Accident; analysis and prevention.

[46]  Tao Mei,et al.  Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[48]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[49]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[50]  Dingli Yu,et al.  Adaptive air-fuel ratio control with MLP network , 2005, Int. J. Autom. Comput..

[51]  L. Breiman Random Forests , 2001, Encyclopedia of Machine Learning and Data Mining.

[52]  Houda Benbrahim,et al.  End-to-end LDA-based automatic weak signal detection in web news , 2021, Knowl. Based Syst..

[53]  Frank Bodendorf,et al.  Intelligent cost estimation by machine learning in supply management: A structured literature review , 2021, Comput. Ind. Eng..

[54]  Marcela Silva Guimarães,et al.  An NLP and Text Mining–based Approach to Categorize Occupational Accidents , 2020 .

[55]  S. Ansaldi,et al.  Extracting Knowledge from Near Miss Reports using Machine-Learning Techniques , 2020 .

[56]  I. Lins,et al.  Automated Classification of Injury Leave based on Accident Description and Natural Language Processing , 2020 .

[57]  Patrizia Baraldi,et al.  Verification of Safety Rules using NLP , 2020 .

[58]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[59]  Ludovic Tanguy,et al.  Natural Language Processing (NLP) tools for the analysis of incident and accident reports , 2012 .