Text Mining Applications in the Construction Industry: Current Status, Research Gaps, and Prospects

With the advent of the Industry 4.0 era, information technology has been widely developed and applied in the construction engineering field. Text mining techniques can extract interesting and important data hidden in plain text, potentially allowing problems in the construction field to be addressed. Although text mining techniques have been used in the construction field for many years, there is a lack of recent reviews focused on their development and application from a literature analysis perspective; therefore, we conducted a review with the aim of filling this gap. We use a combination of bibliometric and manual literature analyses to systematically review the text mining-based literature related to the construction field from 1997 to 2022. Specifically, publication analysis, collaboration analysis, co-citation analysis, and keyword analysis were conducted on 185 articles collected from the SCOPUS database. Based on a read-through of the 185 papers, the current research topics in text mining were manually determined and sorted, including tasks and methods, application areas, and core methods and algorithms. The presented results provide a comprehensive understanding of the current state of TM techniques, thereby contributing to the further development of TM techniques in the construction industry.

[1]  Yun Chen,et al.  Association Mining of Near Misses in Hydropower Engineering Construction Based on Convolutional Neural Network Text Classification , 2022, Comput. Intell. Neurosci..

[2]  Seokho Chi,et al.  Automated system for construction specification review using natural language processing , 2022, Adv. Eng. Informatics.

[3]  David M. Goldberg Characterizing accident narratives with word embeddings: Improving accuracy, richness, and generalizability. , 2021, Journal of safety research.

[4]  Ran Ren,et al.  Semantic Rule-Based Construction Procedural Information Extraction to Guide Jobsite Sensing and Monitoring , 2021, J. Comput. Civ. Eng..

[5]  K. Panuwatwanich,et al.  UNDERSTANDING CONSTRUCTION SITE SAFETY HAZARDS THROUGH OPEN DATA: TEXT MINING APPROACH , 2021, ASEAN Engineering Journal.

[6]  S. Perera,et al.  Text Mining Risk Assessment–Based Model to Conduct Uncertainty Analysis of the General Conditions of Contract in Housing Construction Projects: Case Study of the NSW GC21 , 2021 .

[7]  Abderrahim Benslimane,et al.  MetaInjury: Meta-learning framework for reusing the risk knowledge of different construction accidents , 2021 .

[8]  S. Choi,et al.  AI and Text-Mining Applications for Analyzing Contractor’s Risk in Invitation to Bid (ITB) and Contracts for Engineering Procurement and Construction (EPC) Projects , 2021, Energies.

[9]  H. Yaman,et al.  Research trends analysis using text mining in construction management: 2000–2020 , 2021, Engineering, Construction and Architectural Management.

[10]  Quanlong Liu,et al.  A Correlation Analysis of Construction Site Fall Accidents Based on Text Mining , 2021, Frontiers in Built Environment.

[11]  Yongliang Deng,et al.  An improved text mining approach to extract safety risk factors from construction accident reports , 2021, Safety Science.

[12]  Hubo Cai,et al.  Ontology and rule-based natural language processing approach for interpreting textual regulations on underground utility infrastructure , 2021, Adv. Eng. Informatics.

[13]  Bo Xiao,et al.  Development of an Image Data Set of Construction Machines for Deep Learning Object Detection , 2021, J. Comput. Civ. Eng..

[14]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[15]  Tuyen Le,et al.  Computer-assisted separation of design-build contract requirements to support subcontract drafting , 2021 .

[16]  Min-Yuan Cheng,et al.  Text mining-based construction site accident classification using hybrid supervised machine learning , 2020 .

[17]  Jiansong Zhang,et al.  Building Codes Part-of-Speech Tagging Performance Improvement by Error-Driven Transformational Rules , 2020, J. Comput. Civ. Eng..

[18]  Fahad ul Hassan,et al.  Automated Requirements Identification from Construction Contract Documents Using Natural Language Processing , 2020 .

[19]  Peter E.D. Love,et al.  Automated text classification of near-misses from safety reports: An improved deep learning approach , 2020, Adv. Eng. Informatics.

[20]  Fan Zhang,et al.  A hybrid structured deep neural network with Word2Vec for construction accident causes classification , 2019, International Journal of Construction Management.

[21]  Samuel T. Ariaratnam,et al.  Integrating Geographic Information Systems and Augmented Reality for Mapping Underground Utilities , 2019, Infrastructures.

[22]  Matthew R. Hallowell,et al.  Automatically Learning Construction Injury Precursors from Text , 2019, Automation in Construction.

[23]  Sebastián Ventura,et al.  An advanced review on text mining in medicine , 2019, WIREs Data Mining Knowl. Discov..

[24]  Daehie Hong,et al.  Trends in 3D Printing Technology for Construction Automation Using Text Mining , 2019, International Journal of Precision Engineering and Manufacturing.

[25]  Hasan Fleyeh,et al.  Construction site accident analysis using text mining and natural language processing techniques , 2019, Automation in Construction.

[26]  Ming-Fung Francis Siu,et al.  A DATA-DRIVEN APPROACH TO IDENTIFY-QUANTIFY-ANALYSE CONSTRUCTION RISK FOR HONG KONG NEC PROJECTS , 2018, JOURNAL OF CIVIL ENGINEERING AND MANAGEMENT.

[27]  Bon-Gang Hwang,et al.  Document Management System Using Text Mining for Information Acquisition of International Construction , 2018, KSCE Journal of Civil Engineering.

[28]  Amr Kandil,et al.  Identification of latent legal knowledge in differing site condition (DSC) litigations , 2018, Automation in Construction.

[29]  Mingming Cheng,et al.  A Tri-Method Approach to a Review of Adventure Tourism Literature: Bibliometric Analysis, Content Analysis, and a Quantitative Systematic Literature Review , 2018 .

[30]  Nthatisi Khatleli,et al.  Identification of Enablers and Constraints of Risk Allocation Using Structuration Theory in the Construction Industry , 2018 .

[31]  Byung-Soo Kim,et al.  Analysis of Fire-Accident Factors Using Big-Data Analysis Method for Construction Areas , 2018 .

[32]  J. Yu,et al.  RESEARCH AND PRACTICE OF UAV REMOTE SENSING IN THE MONITORING AND MANAGEMENT OF CONSTRUCTION PROJECTS IN RIPARIAN AREAS , 2018 .

[33]  Jie Li,et al.  Importance Degree Research of Safety Risk Management Processes of Urban Rail Transit Based on Text Mining Method , 2018, Inf..

[34]  Ming Tang,et al.  A Bibliometric Analysis and Visualization of Medical Big Data Research , 2018 .

[35]  Geoffrey Qiping Shen,et al.  Mapping the knowledge domains of Building Information Modeling (BIM): A bibliometric approach , 2017 .

[36]  Jeehee Lee,et al.  Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining , 2017 .

[37]  Yang Miang Goh,et al.  Construction accident narrative classification: An evaluation of text mining techniques. , 2017, Accident; analysis and prevention.

[38]  Arto Kiviniemi,et al.  Retrieving similar cases for construction project risk management using Natural Language Processing techniques , 2017 .

[39]  Matthew R. Hallowell,et al.  Application of machine learning to construction injury prediction , 2016 .

[40]  John Quigley,et al.  Project complexity and risk management (ProCRiM) : towards modelling project complexity driven risk paths in construction projects , 2016 .

[41]  Nora El-Gohary,et al.  Domain-Specific Hierarchical Text Classification for Supporting Automated Environmental Compliance Checking , 2016, J. Comput. Civ. Eng..

[42]  Nora El-Gohary,et al.  Ontology-Based Multilabel Text Classification of Construction Regulatory Documents , 2016, J. Comput. Civ. Eng..

[43]  Nora El-Gohary,et al.  Evaluating the strength of text classification categories for supporting construction field inspection , 2016 .

[44]  Nora El-Gohary,et al.  Semantic NLP-Based Information Extraction from Construction Regulatory Documents for Automated Compliance Checking , 2016, J. Comput. Civ. Eng..

[45]  Brian M. Kleiner,et al.  Control measures of electrical hazards: An analysis of construction industry , 2015 .

[46]  Nora El-Gohary,et al.  Automated Information Transformation for Automated Regulatory Compliance Checking in Construction , 2015, J. Comput. Civ. Eng..

[47]  Amr Kandil,et al.  Automatic Classification of Project Documents on the Basis of Text Content , 2015, J. Comput. Civ. Eng..

[48]  Ken-Yu Lin,et al.  Using ontology-based text classification to assist Job Hazard Analysis , 2014, Adv. Eng. Informatics.

[49]  Jie Gong,et al.  Predicting construction cost overruns using text mining, numerical data and ensemble classifiers , 2014 .

[50]  Nora El-Gohary,et al.  Extending Building Information Models Semiautomatically Using Semantic Natural Language Processing Techniques , 2014, J. Comput. Civ. Eng..

[51]  Heng Li,et al.  Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques , 2013 .

[52]  Amr Kandil,et al.  Document Discourse for Managing Construction Project Documents , 2013, J. Comput. Civ. Eng..

[53]  Shang-Hsien Hsieh,et al.  On Effective Text Classification for Supporting Job Hazard Analysis , 2013 .

[54]  Wen-der Yu,et al.  Content-based text mining technique for retrieval of CAD documents , 2013 .

[55]  Jian Zhang,et al.  Research on BIM-based Construction Domain Text Information Management , 2013, J. Networks.

[56]  Jian Zhang,et al.  Ontology-based Semantic Retrieval for Risk Management of Construction Project , 2013, J. Networks.

[57]  Sou-Sen Leu,et al.  Applying data mining techniques to explore factors contributing to occupational injuries in Taiwan's construction industry. , 2012, Accident; analysis and prevention.

[58]  Jenny A. Harding,et al.  Textual data mining for industrial knowledge management and text classification: A business oriented approach , 2012, Expert Syst. Appl..

[59]  Mehmet Emre Bayraktar,et al.  Application of metadata modeling to dispute review report management , 2010 .

[60]  Frank Boukamp,et al.  Managing construction information using RFID-based semantic contexts , 2010 .

[61]  Amr Kandil,et al.  Concept Relation Extraction from Construction Documents Using Natural Language Processing , 2010 .

[62]  Panagiotis G. Ipeirotis,et al.  Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics , 2010, IEEE Transactions on Knowledge and Data Engineering.

[63]  Yacine Rezgui,et al.  A modified fuzzy clustering for documents retrieval: application to document categorization , 2009, J. Oper. Res. Soc..

[64]  H. Ping Tserng,et al.  Developing a project knowledge management framework for tunnel construction : lessons learned in Taiwan , 2008 .

[65]  Yacine Rezgui,et al.  Text-based domain ontology building using Tf-Idf and metric clusters techniques , 2007, The Knowledge Engineering Review.

[66]  Karthik Ramani,et al.  Ontology-based design information extraction and retrieval , 2007, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[67]  Yacine Rezgui,et al.  Ontology-Centered Knowledge Management Using Information Retrieval Techniques , 2006 .

[68]  G. Vriend,et al.  A text-mining analysis of the human phenome , 2006, European Journal of Human Genetics.

[69]  Dong Wook Lee,et al.  Development of Knowledge Document Management System (KDMS) for Sharing Construction Technical Documents , 2005 .

[70]  Renate Fruchter,et al.  Measuring Relevance in Support of Design Reuse from Archives of Building Product Models , 2005 .

[71]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[72]  Hester J Lipscomb,et al.  Analysis of Text From Injury Reports Improves Understanding of Construction Falls , 2004, Journal of occupational and environmental medicine.

[73]  Carlos H. Caldas,et al.  Automating hierarchical document classification for construction management information systems , 2003 .

[74]  Jiawei Han,et al.  AUTOMATED CLASSIFICATION OF CONSTRUCTION PROJECT DOCUMENTS , 2002 .

[75]  Jiansong Zhang,et al.  Part-of-speech tagging of building codes empowered by deep learning and transformational rules , 2021, Adv. Eng. Informatics.

[76]  Hsien-Kuan Chang,et al.  A knowledge management-based engineering design system for highway design projects , 2021 .

[77]  Weili Fang,et al.  Deep learning-based extraction of construction procedural constraints from construction regulations , 2020, Adv. Eng. Informatics.

[78]  Nora El-Gohary,et al.  Integrating semantic NLP and logic reasoning into a unified system for fully-automated code checking , 2017 .

[79]  S. Sathya,et al.  A Review on Text Mining Techniques , 2015 .

[80]  Samir Kumar Bandyopadhyay,et al.  A tutorial review on Text Mining Algorithms , 2012 .

[81]  Carlos H. Caldas,et al.  Implementing Automated Methods for Document Classification in Construction Management Information Systems , 2002 .