A critical review of text-based research in construction: Data source, analysis method, and implications

Abstract The advancement of natural language processing and text mining techniques facilitate automatic non-trivial pattern extraction and knowledge discovery from text data. However, text-based research has received less attention compared to image- and sensor-based research in the construction industry. Hence, this paper performs a comprehensive review to understand the current state and future insights of text analytics focusing on the data source and analysis method. This study identifies various kinds of text data sources from project documents as well as open data in the websites. In addition, the review finds that the ontology- and rule-based approach has been dominant, at the same time, recent research has attempted to apply the state-of-the-art machine learning methods. It is envisioned that there are potential advancements in construction engineering and management based on the latest text analysis methods along with the enriched data by the digital transformation.

[1]  Nora El-Gohary,et al.  Semantic NLP-Based Information Extraction from Construction Regulatory Documents for Automated Compliance Checking , 2016, J. Comput. Civ. Eng..

[2]  Sophia Ananiadou,et al.  Semantic Annotation for Improved Safety in Construction Work , 2020, LREC.

[3]  Seokho Chi,et al.  Automated Construction Specification Review with Named Entity Recognition Using Natural Language Processing , 2021 .

[4]  Xuefeng Zhao,et al.  Applying Sensor-Based Technology to Improve Construction Safety Management , 2017, Sensors.

[5]  Tamer E. El-Diraby,et al.  Ontology-based optimisation of knowledge management in e-Construction , 2005, J. Inf. Technol. Constr..

[6]  Hasan Fleyeh,et al.  Construction site accident analysis using text mining and natural language processing techniques , 2019, Automation in Construction.

[7]  Michael Chui,et al.  Artificial intelligence: the next digital frontier? , 2017 .

[8]  Todd Hansen,et al.  Designing Transit Agency Job Descriptions for Optimal Roles: An Analytical Text-Mining Approach , 2020 .

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  T. Hanne,et al.  Text Mining Innovation for Business , 2020 .

[11]  Ghang Lee,et al.  An analysis of BIM jobs and competencies based on the use of terms in the industry , 2017 .

[12]  Dongping Cao,et al.  Identifying high-frequency–low-severity construction safety risks: an empirical study based on official supervision reports in Shanghai , 2021 .

[13]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[14]  Fan Xue,et al.  Project-based as-needed information retrieval from unstructured AEC documents , 2015 .

[15]  W. H. Inmon 2.4 – Unstructured Data , 2015 .

[16]  Amr Kandil,et al.  Concept Relation Extraction from Construction Documents Using Natural Language Processing , 2010 .

[17]  Eul-Bum Lee,et al.  Application of Natural Language Processing (NLP) and Text-Mining of Big-Data to Engineering-Procurement-Construction (EPC) Bid and Contract Documents , 2020, 2020 6th Conference on Data Science and Machine Learning Applications (CDMA).

[18]  H. David Jeong,et al.  NLP-Based Approach to Semantic Classification of Heterogeneous Transportation Asset Data Terminology , 2017, J. Comput. Civ. Eng..

[19]  Ambika Paranthaman,et al.  Chapter Thirteen - Machine learning and deep learning algorithms on the Industrial Internet of Things (IIoT) , 2020, Adv. Comput..

[20]  Wen-der Yu,et al.  A self-evolutionary model for automated innovation of construction technologies , 2012 .

[21]  Nora El-Gohary,et al.  Semantic Text Classification for Supporting Automated Compliance Checking in Construction , 2016, J. Comput. Civ. Eng..

[22]  Heng Li,et al.  Retrieving similar cases for alternative dispute resolution in construction accidents using text mining techniques , 2013 .

[23]  Jie Li,et al.  Importance Degree Research of Safety Risk Management Processes of Urban Rail Transit Based on Text Mining Method , 2018, Inf..

[24]  Mehrdad Arashpour,et al.  Viability of the BIM Manager Enduring as a Distinct Role: Association Rule Mining of Job Advertisements , 2018, Journal of Construction Engineering and Management.

[25]  Dong Zhao,et al.  Automated staff assignment for building maintenance using natural language processing , 2020 .

[26]  Nicolás Marín,et al.  An Approach for the Automatic Classification of Work Descriptions in Construction Projects , 2015, Comput. Aided Civ. Infrastructure Eng..

[27]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[28]  Jianfeng Gao,et al.  Adaptive Chinese Word Segmentation , 2004, ACL.

[29]  Ming Gu,et al.  Enhanced Explicit Semantic Analysis for Product Model Retrieval in Construction Industry , 2017, IEEE Transactions on Industrial Informatics.

[30]  Jiansong Zhang,et al.  Building Codes Part-of-Speech Tagging Performance Improvement by Error-Driven Transformational Rules , 2020, J. Comput. Civ. Eng..

[31]  Mohamed Marzouk,et al.  Text analytics to analyze and monitor construction project contract and correspondence , 2019, Automation in Construction.

[32]  Peter E.D. Love,et al.  Revisiting Quality Failure Costs in Construction , 2018 .

[33]  Stuart Palmer,et al.  Characterising “green building” as a topic in Twitter , 2019 .

[34]  Nora El-Gohary,et al.  Extending Building Information Models Semiautomatically Using Semantic Natural Language Processing Techniques , 2016 .

[35]  Jenny A. Harding,et al.  The needs and benefits of Text Mining applications on Post-Project Reviews , 2009, Comput. Ind..

[36]  Matthew R. Hallowell,et al.  Construction Safety Risk Modeling and Simulation , 2016, Risk analysis : an official publication of the Society for Risk Analysis.

[37]  Wen-der Yu,et al.  Content-based text mining technique for retrieval of CAD documents , 2013 .

[38]  Yu Zhang,et al.  An integrated system of text mining technique and case-based reasoning (TM-CBR) for supporting green building design , 2017 .

[39]  Peter E.D. Love,et al.  Mapping computer vision research in construction: Developments, knowledge gaps and implications for research , 2019, Automation in Construction.

[40]  Venkata Santosh Kumar Delhi,et al.  Control Focus in Standard Forms: An Assessment through Text Mining and NLP , 2021 .

[41]  Qionghua Wang,et al.  Application of ALD-Al 2 O 3 in CdS/CdTe Thin-Film Solar Cells , 2019 .

[42]  Celson Lima,et al.  Management of Knowledge Sources Supported by Domain Ontologies: Building and Construction Case Studys , 2015, Intell. Syst. Account. Finance Manag..

[43]  Matthew R. Hallowell,et al.  Automated content analysis for construction safety: A natural language processing system to extract precursors and outcomes from unstructured injury reports , 2016 .

[44]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[45]  Kunhui Ye,et al.  Perceptions of governments towards mitigating the environmental impacts of expressway construction projects: A case of China , 2019, Journal of Cleaner Production.

[46]  Baabak Ashuri,et al.  Application of Natural Language Processing and Text Mining to Identify Patterns in Construction-Defect Litigation Cases , 2019, Journal of Legal Affairs and Dispute Resolution in Engineering and Construction.

[47]  Shang-Hsien Hsieh,et al.  Developing base domain ontology from a reference collection to aid information retrieval , 2019, Automation in Construction.

[48]  Hyejin Park,et al.  Analysis of Trends in Korean BIM Research and Technologies Using Text Mining , 2019 .

[49]  Nora El-Gohary,et al.  Automated Information Transformation for Automated Regulatory Compliance Checking in Construction , 2015, J. Comput. Civ. Eng..

[50]  Theo Haupt,et al.  A bibliometric review of advances in building information modeling (BIM) research , 2021 .

[51]  Ahmed Abdelaty,et al.  Using Basic Natural Language Processing for Effective Project Closeout Process , 2020 .

[52]  Min-Yuan Cheng,et al.  Text mining-based construction site accident classification using hybrid supervised machine learning , 2020 .

[53]  Amr Kandil,et al.  Document Discourse for Managing Construction Project Documents , 2013, J. Comput. Civ. Eng..

[54]  Christopher D. Manning,et al.  Advances in natural language processing , 2015, Science.

[55]  Charlotte van Ooijen,et al.  A data-driven public sector , 2019, OECD Working Papers on Public Governance.

[56]  D. Edwards,et al.  Construction output modelling: a systematic review , 2020, Engineering, Construction and Architectural Management.

[57]  Seokho Chi,et al.  Accident Case Retrieval and Analyses: Using Natural Language Processing in the Construction Industry , 2019, Journal of Construction Engineering and Management.

[58]  Xiaowei Luo,et al.  An automatic literature knowledge graph and reasoning network modeling framework based on ontology and natural language processing , 2019, Adv. Eng. Informatics.

[59]  Hubo Cai,et al.  Ontology and rule-based natural language processing approach for interpreting textual regulations on underground utility infrastructure , 2021, Adv. Eng. Informatics.

[60]  Ken-Yu Lin,et al.  Enabling the creation of domain-specific reference collections to support text-based information retrieval experiments in the architecture, engineering and construction industries , 2008, Adv. Eng. Informatics.

[61]  Tuyen Le,et al.  Computer-assisted separation of design-build contract requirements to support subcontract drafting , 2021 .

[62]  Vadlamani Ravi,et al.  A survey of the applications of text mining in financial domain , 2016, Knowl. Based Syst..

[63]  Qiyu Shen,et al.  Natural-language-based intelligent retrieval engine for BIM object database , 2019, Comput. Ind..

[64]  Yongliang Deng,et al.  An improved text mining approach to extract safety risk factors from construction accident reports , 2021, Safety Science.

[65]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[66]  J. Hippolyte,et al.  Building information modelling knowledge harvesting for energy efficiency in the Construction industry , 2020, Clean Technologies and Environmental Policy.

[67]  Youngjib Ham,et al.  Effective Risk Positioning through Automated Identification of Missing Contract Conditions from the Contractor’s Perspective Based on FIDIC Contract Cases , 2020 .

[68]  Carlos H. Caldas,et al.  Management and analysis of unstructured construction data types , 2008, Adv. Eng. Informatics.

[69]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[70]  Xiaojun Liu,et al.  Attention and sentiment of Chinese public toward green buildings based on Sina Weibo , 2019, Sustainable Cities and Society.

[71]  Qian Chen,et al.  Construction automation: Research areas, industry concerns and suggestions for advancement , 2018, Automation in Construction.

[72]  Jinyue Zhang,et al.  A C-BiLSTM Approach to Classify Construction Accident Reports , 2020, Applied Sciences.

[73]  Guiwen Liu,et al.  Trending topics and themes in offsite construction(OSC) research , 2019, Construction Innovation.

[74]  Vijay Kotu Chapter 9 – Text Mining , 2015 .

[75]  Rita Yi Man Li,et al.  Fast AI classification for analyzing construction accidents claims , 2020 .

[76]  J. J. McArthur,et al.  Machine learning and BIM visualization for maintenance issue classification and enhanced data collection , 2018, Adv. Eng. Informatics.

[77]  Dunja Mladenic,et al.  Feature Construction in Text Mining , 2010, Encyclopedia of Machine Learning.

[78]  Chengke Wu,et al.  Hybrid deep learning model for automating constraint modelling in advanced working packaging , 2021, Automation in Construction.

[79]  Anna Korhonen,et al.  On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling , 2018, EMNLP.

[80]  Arto Kiviniemi,et al.  Retrieving similar cases for construction project risk management using Natural Language Processing techniques , 2017 .

[81]  Sungkon Moon,et al.  Discrete Firefly Algorithm for Scaffolding Construction Scheduling , 2017 .

[82]  Matthew R. Hallowell,et al.  Automatically Learning Construction Injury Precursors from Text , 2019, Automation in Construction.

[83]  Ghang Lee,et al.  Requirements for computational rule checking of requests for proposals (RFPs) for building designs in South Korea , 2015, Adv. Eng. Informatics.

[84]  Hazar Dib,et al.  Structural Equation Model of Building Information Modeling Maturity , 2016 .

[85]  Nora El-Gohary,et al.  Integrating semantic NLP and logic reasoning into a unified system for fully-automated code checking , 2017 .

[86]  Yang Miang Goh,et al.  Construction accident narrative classification: An evaluation of text mining techniques. , 2017, Accident; analysis and prevention.

[87]  Ken-Yu Lin,et al.  Using ontology-based text classification to assist Job Hazard Analysis , 2014, Adv. Eng. Informatics.

[88]  Sebastián Ventura,et al.  An advanced review on text mining in medicine , 2019, WIREs Data Mining Knowl. Discov..

[89]  P. Ambika,et al.  Machine learning and deep learning algorithms on the Industrial Internet of Things (IIoT) , 2020 .

[90]  Jie Gong,et al.  Predicting construction cost overruns using text mining, numerical data and ensemble classifiers , 2014 .

[91]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[92]  Byung-Soo Kim,et al.  Analysis of Fire-Accident Factors Using Big-Data Analysis Method for Construction Areas , 2018 .

[93]  Liyaning Tang,et al.  Social media analytics in the construction industry comparison study between China and the United States , 2020, Engineering, Construction and Architectural Management.

[94]  Arif Mohaimin Sadri,et al.  Social Media Communication Patterns of Construction Industry in Major Disasters , 2020 .

[95]  Fan Zhang,et al.  A hybrid structured deep neural network with Word2Vec for construction accident causes classification , 2019, International Journal of Construction Management.

[96]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[97]  Constantine Kontokosta,et al.  Topic modeling to discover the thematic structure and spatial-temporal patterns of building renovation and adaptive reuse in cities , 2019, Comput. Environ. Urban Syst..

[98]  Ghang Lee,et al.  Automated classification of building information modeling (BIM) case studies by BIM use based on natural language processing (NLP) and unsupervised learning , 2019, Adv. Eng. Informatics.

[99]  Miloš Kovačević,et al.  Building a Construction Project Key-Phrase Network from Unstructured Text Documents , 2017 .

[100]  Peng Lin,et al.  Public-Opinion Sentiment Analysis for Large Hydro Projects , 2016 .

[101]  Xu Na,et al.  Analysis on Relationships of Safety Risk Factors in Metro Construction , 2016 .

[102]  Matthew R. Hallowell,et al.  Construction Safety Clash Detection: Identifying Safety Incompatibilities among Fundamental Attributes using Data Mining , 2017 .

[103]  Chengke Wu,et al.  Developing a hybrid approach to extract constraints related information for constraint management , 2021 .

[104]  Konstantinos Kirytopoulos,et al.  Construction delay risk taxonomy, associations and regional contexts , 2019, Engineering, Construction and Architectural Management.

[105]  Jian-Yun Nie,et al.  Providing Answers to Questions from Automatically Collected Web Pages for Intelligent Decision Making in the Construction Sector , 2008 .

[106]  Wei Zhang,et al.  A Unified Framework for Street-View Panorama Stitching , 2016, Sensors.

[107]  Yongliang Deng,et al.  Extracting Domain Knowledge Elements of Construction Safety Management: Rule-Based Approach Using Chinese Natural Language Processing , 2021 .

[108]  Eul-Bum Lee,et al.  Using Text Mining to Estimate Schedule Delay Risk of 13 Offshore Oil and Gas EPC Case Studies During the Bidding Process , 2019, Energies.

[109]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[110]  Nicolaus Henke,et al.  The age of analytics: competing in a data-driven world , 2016 .

[111]  Jeehee Lee,et al.  Predicting Project’s Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining , 2017 .

[112]  Sara Shirowzhan,et al.  BIM compatibility and its differentiation with interoperability challenges as an innovation factor , 2020, Automation in Construction.

[113]  Jenny A. Harding,et al.  Textual data mining for industrial knowledge management and text classification: A business oriented approach , 2012, Expert Syst. Appl..

[114]  Nora El-Gohary,et al.  Ontology-Based Multilabel Text Classification of Construction Regulatory Documents , 2016, J. Comput. Civ. Eng..

[115]  H. David Jeong,et al.  Syntactic Approach to Extracting Key Elements of Work Modification Cause in Change-Order Documents , 2020 .

[116]  Peter E.D. Love,et al.  Automated text classification of near-misses from safety reports: An improved deep learning approach , 2020, Adv. Eng. Informatics.

[117]  Pramod B. Patil,et al.  Text Mining Methods and Techniques , 2014 .

[118]  Nora El-Gohary,et al.  Domain-Specific Hierarchical Text Classification for Supporting Automated Environmental Compliance Checking , 2016, J. Comput. Civ. Eng..

[119]  Seokho Chi,et al.  Automatic Review of Construction Specifications Using Natural Language Processing , 2019 .

[120]  Eunjeong Lucy Park,et al.  KoNLPy: Korean natural language processing in Python , 2014 .

[121]  Ming Gu,et al.  BIMTag: Concept-based automatic semantic annotation of online BIM product resources , 2017, Adv. Eng. Informatics.

[122]  K. Bakshi,et al.  Considerations for big data: Architecture and approach , 2012, 2012 IEEE Aerospace Conference.

[123]  Heng Li,et al.  Attitude of the Chinese public toward off-site construction: A text mining study , 2019, Journal of Cleaner Production.

[124]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[125]  Shuai Li,et al.  Integrating Natural Language Processing and Spatial Reasoning for Utility Compliance Checking , 2016 .

[126]  Work-Related Fatalities Analysis through Energy Source Recognition , 2020 .

[127]  Ruoyu Jin,et al.  Scientometric Review of Articles Published in ASCE’s Journal of Construction Engineering and Management from 2000 to 2018 , 2019, Journal of Construction Engineering and Management.

[128]  Fahad ul Hassan,et al.  Automated Requirements Identification from Construction Contract Documents Using Natural Language Processing , 2020 .

[129]  H. Toosi,et al.  Comparative study of academic research on project management in Iran and the World with text mining approach and TF–IDF method , 2021 .

[130]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[131]  Yimin Zhu,et al.  Capturing Implicit Structures in Unstructured Content of Construction Documents , 2007 .

[132]  Abdullahi Babatunde Saka,et al.  A global taxonomic review and analysis of the development of BIM research between 2006 and 2017 , 2019, Construction Innovation.

[133]  Zhibin Wu,et al.  A review for presenting building information modeling education and research in China , 2020 .

[134]  Seokho Chi,et al.  Semantic text-pairing for relevant provision identification in construction specification reviews , 2021 .

[135]  Nora El-Gohary,et al.  Ontology-based automated information extraction from building energy conservation codes , 2017 .

[136]  José Dinis Silvestre,et al.  Informetric analysis and review of literature on the role of BIM in sustainable construction , 2019, Automation in Construction.

[137]  Bon-Gang Hwang,et al.  Document Management System Using Text Mining for Information Acquisition of International Construction , 2018, KSCE Journal of Civil Engineering.

[138]  Peter E.D. Love,et al.  Convolutional neural network: Deep learning-based classification of building quality problems , 2019, Adv. Eng. Informatics.

[139]  Jeehee Lee,et al.  Development of Automatic-Extraction Model of Poisonous Clauses in International Construction Contracts Using Rule-Based NLP , 2019, J. Comput. Civ. Eng..

[140]  Micha-Manuel Bues,et al.  LegalTech on the Rise: Technology Changes Legal Work Behaviours, But Does Not Replace Its Profession , 2017 .

[141]  Raimar Scherer,et al.  eWork and eBusiness in architecture, engineering and construction : proceedings of the 6th European Conference on product and process modelling, 13-15 September 2006, Valencia, Spain , 2006 .

[142]  Jiansong Zhang,et al.  Part-of-speech tagging of building codes empowered by deep learning and transformational rules , 2021, Adv. Eng. Informatics.