Polygonal spatial clustering

Clustering, the process of grouping together similar objects, is a fundamental task in data mining to help perform knowledge discovery in large datasets. With the growing number of sensor networks, geospatial satellites, global positioning devices, and human networks tremendous amounts of spatio-temporal data that measure the state of the planet Earth are being collected every day. This large amount of spatio-temporal data has increased the need for efficient spatial data mining techniques. Furthermore, most of the anthropogenic objects in space are represented using polygons, for example - counties, census tracts, and watersheds. Therefore, it is important to develop data mining techniques specifically addressed to mining polygonal data. In this research we focus on clustering geospatial polygons with fixed space and time coordinates. Polygonal datasets are more complex than point datasets because polygons have topological and directional properties that are not relevant to points, thus rendering most state-of-the-art point-based clustering techniques not readily applicable. We have addressed four important sub-problems in polygonal clustering. (1) We have developed a dissimilarity function that integrates both non-spatial attributes and spatial structure and context of the polygons. (2) We have extended DBSCAN, the state-of-the-art density based clustering algorithm for point datasets, to polygonal datasets and further extended it to handle polygonal obstacles. (3) We have designed a suite of algorithms that incorporate user-defined constraints in the clustering process. (4) We have developed a spatio-temporal polygonal clustering algorithm that uniquely treats both space and time as first-class citizens, and developed an algorithm to analyze the movement patterns in the spatio-temporal polygonal clusters. In order to evaluate our algorithms we applied our algorithms on real-life datasets from several diverse domains to solve practical problems such as congressional redistricting, spatial epidemiology, crime mapping, and drought analysis. The results show that our algorithms are effective in finding spatially compact and conceptually coherent clusters.

[1]  Leen-Kiat Soh,et al.  Redistricting Using Constrained Polygonal Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[2]  Manuel Frondel,et al.  Increasing the Efficiency of Transboundary Water Management: A Regionalization Approach , 2010 .

[3]  Leen-Kiat Soh,et al.  Redistricting Using Heuristic-Based Polygonal Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[4]  Cyrus Shahabi,et al.  Accurate Discovery of Valid Convoys from Moving Object Trajectories , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[5]  Leen-Kiat Soh,et al.  A dissimilarity function for clustering geospatial polygons , 2009, GIS.

[6]  Leen-Kiat Soh,et al.  Density-based clustering of polygons , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[7]  San-Yih Hwang,et al.  Discovering Moving Clusters from Spatial-Temporal Databases , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[8]  Christian S. Jensen,et al.  Discovery of convoys in trajectory databases , 2008, Proc. VLDB Endow..

[9]  Heng Tao Shen,et al.  Convoy Queries in Spatio-Temporal Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Yaolin Liu,et al.  KNOWLEDGE DISCOVERY BY SPATIAL CLUSTERING BASED ON SELF-ORGANIZING FEATURE MAP AND A COMPOSITE DISTANCE MEASURE , 2008 .

[11]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[12]  Colin Robertson,et al.  STAMP: spatial–temporal analysis of moving polygons , 2007, J. Geogr. Syst..

[13]  Fang Wu,et al.  A Novel Spatial Clustering with Obstacles Constraints Based on Genetic Algorithms and K-Medoids , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[14]  Kevin Buchin,et al.  Computing the Fréchet distance between simple polygons in polynomial time , 2006, SCG '06.

[15]  Anders Skrondal,et al.  A simulation study of three methods for detecting disease clusters , 2006, International journal of health geographics.

[16]  Kathryn Furlong,et al.  Geographic Opportunity and Neomalthusian Willingness: Boundaries, Shared Rivers, and Conflict , 2006 .

[17]  A. Ramachandra Rao,et al.  Regionalization of watersheds by hybrid-cluster analysis , 2006 .

[18]  Shashi Shekhar,et al.  Spatial data mining: Accomplishments and research needs , 2005 .

[19]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[20]  Andrew W. Moore,et al.  Detection of emerging space-time clusters , 2005, KDD '05.

[21]  Fernando Bação,et al.  Applying genetic algorithms to zone design , 2005, Soft Comput..

[22]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[23]  S. S. Ravi,et al.  Clustering with Constraints: Feasibility Issues and the k-Means Algorithm , 2005, SDM.

[24]  S. S. Ravi,et al.  Towards Efficient and Improved Hierarchical Clustering With Instance and Cluster Level Constraints , 2005 .

[25]  Nga T. Nguyen,et al.  Predicting density-based spatial clusters over time , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[26]  Howard J. Hamilton,et al.  Density-Based Spatial Clustering in the Presence of Obstacles and Facilitators , 2004, PKDD.

[27]  M. Wara,et al.  Regional climate shifts caused by gradual global cooling in the Pliocene epoch , 2004, Nature.

[28]  H. Kriegel,et al.  Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support , 2000, Data Mining and Knowledge Discovery.

[29]  Nizar Grira,et al.  Unsupervised and Semi-supervised Clustering : a Brief Survey ∗ , 2004 .

[30]  Yan Huang,et al.  Exploiting Spatial Autocorrelation to Efficiently Process Correlation-Based Similarity Queries , 2003, SSTD.

[31]  Howard J. Hamilton,et al.  DBRS: A Density-Based Spatial Clustering Method with Random Sampling , 2003, PAKDD.

[32]  J. Kitzinger The Visibility Graph Among Polygonal Obstacles: a Comparison of Algorithms , 2003 .

[33]  Chi-Hoon Lee,et al.  Clustering spatial data when facing physical constraints , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[34]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[35]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[36]  B. Harrison Las Vegas, Nevada , 2002 .

[37]  B. Roehner Patterns of Speculation , 2002 .

[38]  Bing Liu,et al.  OF DATA MINING , 2002 .

[39]  Yukio Sadahiro,et al.  A computational approach for the analysis of changes in polygon distributions , 2001, J. Geogr. Syst..

[40]  W. Macmillan,et al.  Redistricting in a GIS environment: An optimisation algorithm using switching-points , 2001, J. Geogr. Syst..

[41]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[42]  Anthony K. H. Tung,et al.  Spatial clustering in the presence of obstacles , 2001, Proceedings 17th International Conference on Data Engineering.

[43]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[44]  David I. Groves,et al.  Developing the tools for geological shape analysis, with regional‐ to local‐scale examples from the Kalgoorlie Terrane of Western Australia , 2000 .

[45]  Ickjai Lee,et al.  AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles , 2000, TSDM.

[46]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[47]  J. Mayer Geography, ecology and emerging infectious diseases. , 2000, Social science & medicine.

[48]  J. Sack,et al.  Handbook of computational geometry , 2000 .

[49]  R. Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[50]  Ickjai Lee,et al.  AUTOCLUST: Automatic Clustering via Boundary Extraction for Mining Massive Point-Data Sets , 2000 .

[51]  D. Clayton African Americans and the Politics of Congressional Redistricting , 1999 .

[52]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[53]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[54]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[55]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[56]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[57]  Sid Ray,et al.  K-means Clustering for Colour Image Segmentation with Automatic Detection of K , 1998 .

[58]  Jessie P. H. Poon,et al.  The Cosmopolitanization of Trade Regions: Global Trends and Implications, 1965–1990 , 1997 .

[59]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[60]  Micah Altman,et al.  Is automation the answer: the computational complexity of automated redistricting , 1997 .

[61]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[62]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[63]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[64]  David M. Mark,et al.  Modelling Conceptual Neighbourhoods of Toplogical Line-Region Relations , 1995, Int. J. Geogr. Inf. Sci..

[65]  Hisashi Nakamura,et al.  Fast Spatio-Temporal Data Mining of Large Geophysical Datasets , 1995, KDD.

[66]  Max J. Egenhofer,et al.  On the Equivalence of Topological Relations , 1995, Int. J. Geogr. Inf. Sci..

[67]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[68]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[69]  Max J. Egenhofer,et al.  Topological Relations Between Regions with Holes , 1994, Int. J. Geogr. Inf. Sci..

[70]  Günter Rote,et al.  Computing the Minimum Hausdorff Distance Between Two Point Sets on a Line Under Translation , 1991, Inf. Process. Lett..

[71]  Alexander V. Arhangel'skii,et al.  General topology I: Basic concepts and constructions dimension theory , 1990 .

[72]  David G. Kirkpatrick,et al.  A Linear Algorithm for Determining the Separation of Convex Polyhedra , 1985, J. Algorithms.

[73]  Christophe Perruchet,et al.  Constrained agglomerative hierarchical classification , 1983, Pattern Recognit..

[74]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[75]  Pip Forer ELEMENTS OF SPATIAL STRUCTURE: A QUANTITATIVE APPROACH , 1977 .

[76]  David H. Douglas,et al.  Detection of Surface-Specific Points by Local Parallel Processing of Discrete Terrain Elevation Data , 1975 .

[77]  Lawrence D. Bodin A DISTRICTING EXPERIMENT WITH A CLUSTERING ALGORITHM , 1973 .

[78]  L. Bodin Democratic representation and apportionment: a districting experiment with a clustering algorithm. , 1973, Annals of the New York Academy of Sciences.

[79]  R. Webster,et al.  COMPUTER‐BASED SOIL MAPPING OF SMALL AREAS FROM SAMPLE DATA , 1972 .

[80]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[81]  B. Delaunay Neue Darstellung der geometrischen Kristallographie , 1933 .

[82]  Ranga Raju Vatsavai,et al.  Trends in Spatial Data Mining , 2022 .