Knowledge Discovery in a Facility Condition Assessment Database Using Text Clustering

Knowledge discovery in databases (KDD) has been applied in many different areas of study including DNA sequence analysis, pattern discovery, document classification, image recognition, and speech recognition. This paper presents the application of KDD in the analysis of a facility condition assessment (FCA) database. The FCA database contains information on facilities located at three campuses within a statewide university system. The case study utilizes cluster analysis for text mining. Cluster analysis is the grouping of objects that are similar within the same cluster and dissimilar to the other clusters. In this analysis, deficiency descriptions from a university’s FCA database are the objects being grouped together into clusters. Deficiency descriptions were gathered from 15 housing facilities and 15 academic facilities located at 3 campuses. The results show how some clusters of facility deficiencies are unique with respect to the type of facility and the influence of location on deficiencies of aca...

[1]  Vipin Kumar,et al.  Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results , 1998, IEEE Data Eng. Bull..

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[4]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[5]  Jinyuan You,et al.  CLOPE: a fast and effective clustering algorithm for transactional data , 2002, KDD.

[6]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[7]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[8]  Lucio Soibelman,et al.  Data Preparation Process for Construction Knowledge Generation through Knowledge Discovery in Databases , 2002 .

[9]  Heikki Mannila,et al.  Methods and Problems in Data Mining , 1997, ICDT.

[10]  Chris H. Q. Ding,et al.  Cluster merging and splitting in hierarchical clustering algorithms , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Michael Brady,et al.  Preface - The Changing Shape of Computer Vision , 1981, Artif. Intell..

[12]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[13]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[14]  Jeffrey Heer,et al.  LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition , 2002, WEBKDD.

[15]  Carlos H. Caldas,et al.  Implementing Automated Methods for Document Classification in Construction Management Information Systems , 2002 .

[16]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[17]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[18]  Oren Etzioni,et al.  Fast and Intuitive Clustering of Web Documents , 1997, KDD.

[19]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[20]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[21]  George Karypis,et al.  Clustering in life sciences. , 2003, Methods in molecular biology.

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[24]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .