Extraction of Construction Regulatory Requirements from Textual Documents Using Natural Language Processing Techniques

Automated regulatory compliance checking requires automated information extraction (IE) from regulatory textual documents (e.g. building codes). Automated IE is a challenging task that requires complex processing of text. Natural Language Processing (NLP) aims at enabling computers to process natural language text in a human-like manner using a variety of text processing techniques, such as phrase-structure parsing, dependency parsing, etc. This paper proposes a hybrid syntactic (syntax/grammar-related) and semantic (meaning/context-related) NLP approach for automated IE from construction regulatory documents, and explores the use of two techniques (phrase-structure grammar and dependency grammar) for extracting information from complex sentences. IE rules were developed based on Chapter 12 of the 2006 International Building Code; and the approach was tested on Chapter 12 of the 2009 International Fire Code. Initial experimental results are presented, empirically evaluated in terms of precision and recall, and discussed.

[1]  Michael A. Covington,et al.  A Fundamental Algorithm for Dependency Parsing , 2004 .

[2]  Nora El-Gohary,et al.  Automated regulatory information extraction from building codes leveraging syntactic and semantic information , 2012 .

[3]  Lluís Màrquez Villodre Machine learning and natural language processing , 2000 .

[4]  Amin Hammad,et al.  Automated Code Compliance Checking for Building Envelope Design , 2010, J. Comput. Civ. Eng..

[5]  Carlos H. Caldas,et al.  Automating hierarchical document classification for construction management information systems , 2003 .

[6]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[7]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[8]  Charles M. Eastman,et al.  Automatic rule-based checking of building designs , 2009 .

[9]  Yimin Zhu,et al.  Capturing Implicit Structures in Unstructured Content of Construction Documents , 2007 .

[10]  Steven J. Fenves,et al.  A knowledge-based standards processor for structural component design , 1987, Engineering with Computers.

[11]  E. F. Burkholder The 3-D Global Spatial Data Model (GSDM) Supports Modern Civil Engineering Practice and Education , 2012 .

[12]  Kincho H. Law,et al.  An Information Infrastructure for Comparing Accessibility Regulations and Related Information from Multiple Sources , 2004 .

[13]  Walt Detmar Meurers,et al.  Head-driven phrase structure grammar: linguistic approach, formal foundations, and computational realization , 2006 .

[14]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[15]  Amr Kandil,et al.  Concept Relation Extraction from Construction Documents Using Natural Language Processing , 2010 .