Enabling the creation of domain-specific reference collections to support text-based information retrieval experiments in the architecture, engineering and construction industries

The increasing importance of text-based information retrieval (IR) developments in the architecture, engineering, and construction industries (AEC) and the lack of sharable testing resources to support these developments call for an approach that can be used to generate domain-specific reference collections. To address this need, the authors investigated the characteristics of the testing environment in AEC and ways to adapt dominant collection preparation methods for the domain. This paper presents the authors' collection generation approach through the preparation process of the Taiwanese National Center for Research on Earthquake Engineering (NCREE) collection. The collection's Chinese-to-English translation instruments are also discussed as matching semantic/linguistic resources are highly valued in AEC's text-based IR developments. The paper also includes a use case for the NCREE collection to show how a collection generated by the proposed approach could be applied to support research experiment and validation. The direct outputs, the NCREE collection and its translation instruments, are sharable and reusable testing resources, while mechanisms for seeking collections from other researchers are part of the extended research endeavors.

[1]  Dana J. Vanier,et al.  Use of Keyphrase Extraction Software for Creation of an AEC/FM Thesaurus , 2000, J. Inf. Technol. Constr..

[2]  Renate Fruchter,et al.  Measuring Relevance in Support of Design Reuse from Archives of Building Product Models , 2005 .

[3]  Lucio Soibelman,et al.  Promoting transactions for A/E/C product information , 2006 .

[4]  S. J. Fenves,et al.  A broker for tracking, delivering and using regulations over the World Wide Web , 1996, Proceedings of the 1996 IEEE International Symposium on Electronics and the Environment. ISEE-1996.

[5]  Yacine Rezgui,et al.  A document management methodology based on similarity contents , 2004, Inf. Sci..

[6]  Liang Y Liu,et al.  Design Review Checking System with Corporate Lessons Learned , 2003 .

[7]  Ren-Jye Dzeng,et al.  Learning search keywords for construction procurement , 2005 .

[8]  Shaofeng Liu,et al.  A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management , 2006, Adv. Eng. Informatics.

[9]  Kazem Taghva,et al.  Information access in the presence of OCR errors , 2004, HDP '04.

[10]  Yacine Rezgui,et al.  Ontology-Centered Knowledge Management Using Information Retrieval Techniques , 2006 .

[11]  Robert Burgin,et al.  Performance Standards and Evaluations in IR Test Collections: Cluster-Based Retrieval Models , 1997, Inf. Process. Manag..

[12]  Les Gasser,et al.  Methodology for the Integration of Project Documents in Model-Based Information Systems , 2005 .

[13]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[14]  Thomas Froese,et al.  INTEGRATING HETEROGENEOUS DATA REPRESENTATIONS IN MODEL-BASED AEC/FM SYSTEMS , 2000 .

[15]  Bhuvana Ramabhadran,et al.  Building an information retrieval test collection for spontaneous conversational speech , 2004, SIGIR '04.

[16]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[17]  Donna K. Harman,et al.  Overview of the Sixth Text REtrieval Conference (TREC-6) , 1997, Inf. Process. Manag..

[18]  Lucio Soibelman,et al.  Knowledge-Assisted Retrieval of Online Product Information in Architectural/Engineering/Construction , 2007 .

[20]  Ed Greengrass,et al.  Information Retrieval: A Survey , 2000 .

[21]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[22]  Linda Schamber Relevance and Information Behavior. , 1994 .

[23]  Robert R. Korfhage,et al.  Information Storage and Retrieval , 1963 .

[24]  Keh-Jiann Chen,et al.  Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[25]  Jiawei Han,et al.  AUTOMATED CLASSIFICATION OF CONSTRUCTION PROJECT DOCUMENTS , 2002 .

[26]  Frank Molkenthin,et al.  Semantic Documentation in Engineering: Content Retrieval by Arbitrary Information , 2000 .

[27]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Evaluation in information retrieval , 2008 .

[28]  Ziga Turk,et al.  Mapping the W78 papers onto the construction informatics topic map , 2003 .

[29]  W. Bruce Croft,et al.  Indri: A language-model based search engine for complex queries1 , 2005 .

[30]  ChengXiang Zhai Notes on the KL-divergence retrieval formula and Dirichlet prior smoothing , 2003 .

[31]  Stefano Mizzaro,et al.  How many relevances in information retrieval? , 1998, Interact. Comput..

[32]  Robert Amor,et al.  Identification and classification of A/E/C web sites and pages , 2002 .

[33]  Mark Sanderson,et al.  Forming test collections with no system pooling , 2004, SIGIR '04.

[34]  Carlos H. Caldas,et al.  Automating hierarchical document classification for construction management information systems , 2003 .

[35]  Maria C. Yang,et al.  Data Mining for Thesaurus Generation in Informal Design Information Retrieval , 1998 .

[36]  John McKechnie,et al.  Computer assisted processing of large unstructured document sets: a case study in the construction industry , 2001, DocEng '01.