AUTOMATED CLASSIFICATION OF CONSTRUCTION PROJECT DOCUMENTS

The number of documents generated in a construction project and stored in interorganizational information systems is significant. Since a large percentage of these project documents are generated in text format, methods for organizing and improving access to the information contained in these types of documents become essential to construction information management. Information classification schemes can be used for this purpose. They provide a common framework to enact document organization and information exchange among project members. Current systems for document management rely on manual classification methods controlled by human experts. Due to the widespread use of information technologies for construction, the increasing availability of electronic documents, and the development of systems based on project object models, manual classification becomes unfeasible. This paper presents a unique way to improve information organization and access in interorganizational systems based on automated classification of construction project documents according to their related project components. Machine-learning methods were used for this purpose. A prototype of a document classification system was developed to provide easy deployment and scalability to the classification process.

[1]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[2]  G. Edward Gibson,et al.  PLANNING FOR COMPUTER INTEGRATED CONSTRUCTION , 1999 .

[3]  Eddy M. Rojas,et al.  WEB-CENTRIC SYSTEMS: A NEW PARADIGM FOR COLLABORATIVE ENGINEERING , 1999 .

[4]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  William J. O'Brien IMPLEMENTATION ISSUES IN PROJECT WEB SITES: A PRACTIONER'S VIEWPOINT , 2000 .

[7]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[8]  Lucio Soibelman,et al.  DISTRIBUTED MULTI-REASONING MECHANISM TO SUPPORT CONCEPTUAL STRUCTURAL DESIGN , 2000 .

[9]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[10]  William H. Wood The Development of Modes in Textual Design Data , 2000 .

[11]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[12]  Eric Brill,et al.  Text Classification in USENET Newsgroups: A Progress Report , 1996 .

[13]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[14]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[15]  Boyd C. Paulson,et al.  Adaptability of information classification systems for civil works , 1997 .

[16]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Victor E. Sanvido,et al.  Applying Computer‐Integrated Manufacturing Concepts to Construction , 1990 .

[19]  Martin Fischer,et al.  The Circle: Architecture for Integrating Software , 1995 .

[20]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[21]  Maria C. Yang,et al.  Data Mining for Thesaurus Generation in Informal Design Information Retrieval , 1998 .

[22]  Mary Lou Maher,et al.  Ontology-Based Multimedia Data Mining for Design Information Retrieval , 1998 .

[23]  Yimin Zhu,et al.  Web-Based Construction Document Processing via Malleable Frame , 2001 .

[24]  Yacine Rezgui,et al.  INTER-ENTERPRISE INFORMATION MANAGEMENT IN DYNAMIC VIRTUAL ENVIRONMENTS: THE OSMOS APPROACH , 2000 .

[25]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[26]  Yacine Rezgui,et al.  An information management model for concurrent construction engineering , 1996 .

[27]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[28]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[29]  Renate Fruchter,et al.  A/E/C Teamwork: A Collaborative Design and Learning Space , 1999 .

[30]  Thomas Froese,et al.  INTEGRATING HETEROGENEOUS DATA REPRESENTATIONS IN MODEL-BASED AEC/FM SYSTEMS , 2000 .

[31]  Matthias Dimter On Text Classification , 1985 .

[32]  Chimay J. Anumba,et al.  A Taxonomy for Communication Facets in Concurrent Life‐Cycle Design and Construction , 1999 .

[33]  Raimar J. Scherer,et al.  Retrieval of Project Knowledge from Heterogeneous AEC Documents , 2000 .

[34]  A Zarli,et al.  A survey of internet-oriented technologies for document-driven applications in construction open dynamic virtual environments , 2000 .