A progressive refinement approach to spatial data mining

The goal of this thesis is to analyze methods for mining of spatial data, and to determine environments in which efficient spatial data mining methods can be implemented. In the spatial data mining process, we use (1) non-spatial properties of the spatial objects and (2) attributes, predicates and functions describing spatial relations between described objects and other features located in the spatial proximity of the described objects. The descriptions are generalized, transformed into predicates, and the discovered knowledge is presented using multiple levels of concepts. We introduce the concept of spatial association rules and present efficient algorithms for mining spatial associations and for the classification of objects stored in geographic information databases. A spatial association rule describes the implication of one or a set of features (or predicates) by another set of features in spatial databases. A spatial classification process is a process that assigns a set of spatial objects into a number of given classes based on a set of spatial and non-spatial features (predicates). The developed algorithms are based on the progressive refinement approach. This approach allows for efficient discovery of knowledge in large spatial databases. A complete set of spatial association rules can be discovered, and accurate decision trees can be constructed, using the progressive refinement approach. Theoretical analysis and experimental results demonstrate the efficiency of the algorithms. The completeness of the set of discovered spatial association rules is shown through the theoretical analysis and the experiments show that the proposed spatial classification algorithm allows for better accuracy of classification than the algorithm proposed in the previous work [37]. The results of the research have been incorporated into the spatial data mining system prototype, GeoMiner. GeoMiner includes five spatial data mining modules: characterizer, comparator, associator, cluster analyzer, and classifier. The SAND (Spatial And Nonspatial Data) architecture has been applied in the modeling of spatial databases. The GeoMiner system includes the spatial data cube construction module, the spatial on-line analytical processing (OLAP) module, and spatial data mining modules. A spatial data mining language, GMQL (Geo-Mining Query Language), is designed and implemented as an extension to Spatial SQL, for spatial data mining. Moreover, an interactive, user-friendly data mining interface has been constructed and tools have been implemented for visualization of discovered spatial knowledge. (Abstract shortened by UMI.)

[1]  Luc Anselin,et al.  Interactive Techniques and Exploratory Spatial Data Analysis , 1996 .

[2]  R. McMaster,et al.  Map Generalization: Making Rules for Knowledge Representation , 1991 .

[3]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[4]  D. Bell,et al.  Evidence Theory and Its Applications , 1991 .

[5]  Jiawei Han,et al.  Knowledge Mining in Databases: An Integration of Machine Learning Methodologies with Database Techno , 1995 .

[6]  Rangasami L. Kashyap,et al.  An Object-Oriented Knowledge Representation for Spatial Information , 1988, IEEE Trans. Software Eng..

[7]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[8]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[9]  William W. Cohen Learning Trees and Rules with Set-Valued Features , 1996, AAAI/IAAI, Vol. 1.

[10]  Carlo Zaniolo,et al.  Metaqueries for Data Mining , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[12]  Donald D. Chamberlin,et al.  SEQUEL: A structured English query language , 1974, SIGFIDET '74.

[13]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[14]  Jiawei Han,et al.  Exploration of the power of attribute-oriented induction in data mining , 1995, KDD 1995.

[15]  C. Tomlin Geographic information systems and cartographic modeling , 1990 .

[16]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[17]  Walid G. Aref,et al.  Extending a DBMS with Spatial Operations , 1991, SSD.

[18]  Raymond T. Ng,et al.  Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining , 1996, IEEE Trans. Knowl. Data Eng..

[19]  Jiawei Han,et al.  Meta-Rule-Guided Mining of Association Rules in Relational Databases , 1995, KDOOD/TDOOD.

[20]  Hisashi Nakamura,et al.  Fast Spatio-Temporal Data Mining of Large Geophysical Datasets , 1995, KDD.

[21]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[22]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[23]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[24]  Daniel A. Griffith,et al.  Spatial Statistics: Past, Present, and Future , 1990 .

[25]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[26]  Hans-Peter Kriegel,et al.  Density-Connected Sets and their Application for Trend Detection in Spatial Databases , 1997, KDD.

[27]  Daniel A. Grijfith Statistical Techniques in Geographical Analysis , 1985 .

[28]  Jiawei Han,et al.  Towards on-line analytical mining in large databases , 1998, SGMD.

[29]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[30]  Jiawei Han,et al.  Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes , 2000, IEEE Trans. Knowl. Data Eng..

[31]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[32]  Douglas H. Fisher,et al.  Improving Inference through Conceptual Clustering , 1987, AAAI.

[33]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[34]  Jiawei Han,et al.  Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases , 1994, KDD Workshop.

[35]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[36]  YiQing Yu Finding strong, common and discriminating characteristics of clusters from thematic maps , 1996 .

[37]  Jiawei Han,et al.  Selective Materialization: An Efficient Method for Spatial Data Cube Construction , 1998, PAKDD.

[38]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[39]  Usama Fayyad,et al.  Automated analysis of a large-scale sky survey: the SKICAT system , 1993 .

[40]  Hans-Peter Kriegel,et al.  Algorithms for Characterization and Trend Detection in Spatial Databases , 1998, KDD.

[41]  Jiawei Han,et al.  Data Mining Methods for the Analysis of Large Geographic Databases , 1996 .

[42]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[43]  Pietro Perona,et al.  Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth , 1994, KDD Workshop.

[44]  Soumitra Dutta,et al.  Topological Constraints: A Representational Framework For Approximate Spatial And Temporal Reasoning , 1991, SSD.

[45]  Andrew K. C. Wong,et al.  Statistical Technique for Extracting Classificatory Knowledge from Databases , 1991, Knowledge Discovery in Databases.

[46]  Jiawei Han,et al.  Discovery of multiple-level rules from large databases , 1996 .

[47]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[48]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[49]  Doron Rotem,et al.  Sampling from spatial databases , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[50]  Allen Silver,et al.  Beta , 1975, The SAGE Encyclopedia of Research Design.

[51]  Erich Schikuta,et al.  The BANG-Clustering System: Grid-Based Data Analysis , 1997, IDA.

[52]  Usama M. Fayyad,et al.  Automating the Analysis and Cataloging of Sky Surveys , 1996, Advances in Knowledge Discovery and Data Mining.

[53]  Derick Wood,et al.  An Optimal Worst Case Algorithm for Reporting Intersections of Rectangles , 1980, IEEE Transactions on Computers.

[54]  Christopher Dean,et al.  Quakefinder: A Scalable Data Mining System for Detecting Earthquakes from Space , 1996, KDD.

[55]  Hans-Peter Kriegel,et al.  A distribution-based clustering algorithm for mining in large spatial databases , 1998, Proceedings 14th International Conference on Data Engineering.

[56]  Gregory Piatetsky-Shapiro,et al.  The interestingness of deviations , 1994 .

[57]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[58]  Raymond T. Ng,et al.  Extraction of Spatial Proximity Patterns by Concept Generalization , 1996, KDD.

[59]  Philip K. Chan,et al.  Systems for Knowledge Discovery in Databases , 1993, IEEE Trans. Knowl. Data Eng..

[60]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[61]  Bernard W. Silverman,et al.  Methods for Analysing Spatial Processes of Several Types of Points , 1982 .

[62]  Jiawei Han,et al.  Generalization and decision tree induction: efficient classification in data mining , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[63]  M. Berman Testing for spatial association between a point process and another stochastic process , 1986 .

[64]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[65]  Padhraic Smyth,et al.  Image database exploration: progress and challenges , 1993 .

[66]  Hans-Peter Kriegel,et al.  Spatial Data Mining: A Database Approach , 1997, SSD.

[67]  Chris Clifton,et al.  Query flocks: a generalization of association-rule mining , 1998, SIGMOD '98.

[68]  Jiawei Han Knowledge Discovery in Object-Oriented and Active Databases , 1993 .

[69]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[70]  Max J. Egenhofer,et al.  Reasoning about Binary Topological Relations , 1991, SSD.

[71]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[72]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[73]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[74]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[75]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[76]  Jiawei Han,et al.  Distance-associated join indices for spatial range search , 1992, [1992] Eighth International Conference on Data Engineering.

[77]  R. Ng,et al.  Eecient and Eeective Clustering Methods for Spatial Data Mining , 1994 .

[78]  Martin L. Kersten,et al.  Architectural Support for Data Mining , 1994, KDD Workshop.

[79]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[80]  Max J. Egenhofer,et al.  Spatial SQL: A Query and Presentation Language , 1994, IEEE Trans. Knowl. Data Eng..

[81]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[82]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[83]  Jiawei Han,et al.  Resource and knowledge discovery from the internet and multimedia repositories , 1999 .

[84]  J. Ross Quinlan,et al.  Decision trees and decision-making , 1990, IEEE Trans. Syst. Man Cybern..

[85]  Alan T. Murray,et al.  Mining Spatial Data via Clustering , 1998 .

[86]  Robert F. Cromp,et al.  Data mining of multidimensional remotely sensed images , 1993, CIKM '93.

[87]  Jiawei Han,et al.  Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes , 1997, KDD.

[88]  Derek Thompson,et al.  Fundamentals of spatial information systems , 1992, A.P.I.C. series.

[89]  Hans-Peter Kriegel,et al.  Visualization Techniques for Mining Large Databases: A Comparison , 1996, IEEE Trans. Knowl. Data Eng..

[90]  Doron Rotem Spatial join indices , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[91]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[92]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[93]  David B. Lomet,et al.  Bulletin of the Technical Committee on Data Engineering Special Issue on Data Reduction Techniques Announcements and Notices Letter from the Editor-in-chief 1 Technical Committee Election Changing Editorial Staa Letter from the Special Issue Editor the New Jersey Data Reduction Report , 2022 .

[94]  Jiawei Han,et al.  Spatial Data Mining: Progress and Challenges , 1996, Workshop on Research Issues on Data Mining and Knowledge Discovery.

[95]  Usama M. Fayyad,et al.  The Attribute Selection Problem in Decision Tree Generation , 1992, AAAI.

[96]  Benjamin Kuipers,et al.  Modeling Spatial Knowledge , 1978, IJCAI.

[97]  Jiawei Han,et al.  GeoMiner: a system prototype for spatial data mining , 1997, SIGMOD '97.

[98]  Beng Chin Ooi,et al.  Spatial Join Strategies in Distributed Spatial DBMS , 1995, SSD.

[99]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[100]  Dag Tjøstheim,et al.  A measure of association for spatial variables , 1978 .

[101]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[102]  Beng Chin Ooi,et al.  Discovery of General Knowledge in Large Spatial Databases , 1993 .

[103]  Hans-Peter Kriegel,et al.  Multi-step processing of spatial joins , 1994, SIGMOD '94.

[104]  Richard R. Muntz,et al.  Scalable Exploratory Data Mining of Distributed Geoscientific Data , 1996, KDD.

[105]  Soumitra Dutta,et al.  Qualitative Spatial Reasoning: A Semi-quantitative Approach Using Fuzzy Logic , 1989, SSD.

[106]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[107]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.