CLARANS: A Method for Clustering Objects for Spatial Data Mining

Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. To this end, this paper has three main contributions. First, it proposes a new clustering method called CLARANS, whose aim is to identify spatial structures that may be present in the data. Experimental results indicate that, when compared with existing clustering methods, CLARANS is very efficient and effective. Second, the paper investigates how CLARANS can handle not only point objects, but also polygon objects efficiently. One of the methods considered, called the IR-approximation, is very efficient in clustering convex and nonconvex polygon objects. Third, building on top of CLARANS, the paper develops two spatial data mining algorithms that aim to discover relationships between spatial and nonspatial attributes. Both algorithms can discover knowledge that is difficult to find with existing spatial data mining algorithms.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[3]  Walid G. Aref,et al.  Optimization for Spatial Query Processing , 1991, Very Large Data Bases Conference.

[4]  YiQing Yu Finding strong, common and discriminating characteristics of clusters from thematic maps , 1996 .

[5]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[6]  Fionn Murtagh,et al.  Cluster Dissection and Analysis: Theory, Fortran Programs, Examples. , 1986 .

[7]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[8]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[9]  David G. Kirkpatrick,et al.  Tentative Prune-and-Search for Computing Fixed-Points with Applications to Geometric Computation , 1995, Fundam. Informaticae.

[10]  Alexander Borgida,et al.  Loading data into description reasoners , 1993, SIGMOD Conference.

[11]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[12]  Hans-Peter Kriegel,et al.  Supporting data mining of large databases by visual feedback queries , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[15]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[16]  Eugene Wong,et al.  Query optimization by simulated annealing , 1987, SIGMOD '87.

[17]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[18]  Oliver Günther Efficient Computation of Spatial Joins , 1993, ICDE.

[19]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[20]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[21]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[22]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[23]  Derek Thompson,et al.  Fundamentals of spatial information systems , 1992, A.P.I.C. series.

[24]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[25]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[26]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[27]  Beng Chin Ooi,et al.  Discovery of General Knowledge in Large Spatial Databases , 1993 .

[28]  Gustavo Rossi,et al.  Designing adaptable geographic objects for mobile applications , 2003, Fourth International Conference on Web Information Systems Engineering Workshops, 2003. Proceedings..

[29]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[30]  Jiawei Han,et al.  Knowledge Discovery in Databases: An Attribute-Oriented Approach , 1992, VLDB.

[31]  David G. Kirkpatrick,et al.  Tentative prune-and-search for computing Voronoi vertices , 1993, SCG '93.

[32]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[33]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[34]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[35]  David G. Kirkpatrick,et al.  A Linear Algorithm for Determining the Separation of Convex Polyhedra , 1985, J. Algorithms.

[36]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[37]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.