Data Preparation Process for Construction Knowledge Generation through Knowledge Discovery in Databases

As the construction industry is adapting to new computer technologies in terms of hardware and software, comp construction data are becoming increasingly available. The explosive growth of many business, government, and scientific data begun to far outpace our ability to interpret and digest the data. Such volumes of data clearly overwhelm the traditional method analysis such as spreadsheets and ad-hoc queries. The traditional methods can create informative reports from data, but cannot contents of those reports. A significant need exists for a new generation of techniques and tools with the ability to automatica humans in analyzing the mountains of data for useful knowledge. Knowledge discovery in databases ~KDD! and data mining~DM! are tools that allow identification of valid, useful, and previously unknown patterns so that the construction manager may analyze t amount of construction project data. These technologies combine techniques from machine learning, artificial intelligence, patte nition, statistics, databases, and visualization to automatically extract concepts, interrelationships, and patterns of interest fr databases. This paper presents the necessary steps such as ~1! identification of problems, ~2! data preparation, ~3! data mining,~4! data analysis, and~5! refinement process required for the implementation of KDD. In order to test the feasibility of the proposed appr prototype of the KDD system was developed and tested with a construction management database, RMS ~Resident Management System !, provided by the U. S. Corps of Engineers. In this paper, the KDD process was applied to identify the cause ~s! of construction activity delays. However, its possible applications can be extended to identify cause ~s! of cost overrun and quality control/assurance among ot construction problems. Predictable patterns may be revealed in construction data that were previously thought to be chaotic. DOI: 10.1061/ ~ASCE!0887-3801~2002!16:1~39! CE Database keywords: Data processing; Databases; Neural networks; Construction industry; Data analysis.

[1]  J. K. Yates Construction Decision Support System for Delay Analysis , 1993 .

[2]  Karim K. Hirji,et al.  Discovering data mining: from concept to implementation , 1999, SKDD.

[3]  Peter F. Stadler,et al.  Knowledge Discovery in RNA Sequence Families of HIV Using Scalable Computers , 1996, KDD.

[4]  Hervé Bourlard,et al.  Improving statistical speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[5]  Khaldoon A. Bani-Hani,et al.  NONLINEAR STRUCTURAL CONTROL USING NEURAL NETWORKS , 1998 .

[6]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[7]  James H. Garrett,et al.  A Knowledge Discovery Case Study for the Intelligent Workplace , 2000 .

[8]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[9]  Bernard Widrow,et al.  The basic ideas in neural networks , 1994, CACM.

[10]  Hisashi Nakamura,et al.  Fast Spatio-Temporal Data Mining of Large Geophysical Datasets , 1995, KDD.

[11]  Mary Lou Maher,et al.  Ontology-Based Multimedia Data Mining for Design Information Retrieval , 1998 .

[12]  P. J. Werbos,et al.  Backpropagation and neurocontrol: a review and prospectus , 1989, International 1989 Joint Conference on Neural Networks.

[13]  Bernard Widrow,et al.  Neural networks: applications in industry, business and science , 1994, CACM.

[14]  Osama Moselhi,et al.  Classification of Defects in Sewer Pipes Using Neural Networks , 2000 .

[15]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[16]  David A. Bell,et al.  Discovering Case Knowledge Using Data Mining , 1998, PAKDD.

[17]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[18]  Michael I. Jordan,et al.  Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.

[19]  R. L. Kennedy,et al.  The early diagnosis of heart attacks: a neurocomputational approach , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Richard D. Braatz,et al.  On the "Identification and control of dynamical systems using neural networks" , 1997, IEEE Trans. Neural Networks.

[22]  George H. John Enhancements to the data mining process , 1997 .

[23]  Inderpal S. Bhandari,et al.  Advanced Scout: Data Mining and Knowledge Discovery in NBA Data , 2004, Data Mining and Knowledge Discovery.

[24]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[25]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[26]  F. L. Casselman,et al.  A neural network-based passive sonar detection and classification design with a low false alarm rate , 1991, [1991 Proceedings] IEEE Conference on Neural Networks for Ocean Engineering.

[27]  T. Anand,et al.  SPOTLIGHT: a data explanation system , 1992, Proceedings Eighth Conference on Artificial Intelligence for Applications.

[28]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[29]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[30]  Ali Jaafari Criticism of CPM for Project Planning Analysis , 1984 .

[31]  Ching-Piao Tsai,et al.  BACK-PROPAGATION NEURAL NETWORK IN TIDAL-LEVEL FORECASTING , 2001 .

[32]  Ronald J. Brachman,et al.  Brief Application Description; Visual Data Mining: Recognizing Telephone Calling Fraud , 2004, Data Mining and Knowledge Discovery.

[33]  Boyd C. Paulson,et al.  Professional Construction Management , 1978 .