HiSpatialCluster: A novel high‐performance software tool for clustering massive spatial points

In the era of big data, spatial clustering is a very important means for geo‐data analysis. When clustering big geo‐data such as social media check‐in data, geotagged photos, and taxi trajectory points, traditional spatial clustering algorithms are facing more challenges. On the one hand, existing spatial clustering tools cannot support the clustering of massive point sets; on the other hand, there is no perfect solution for self‐adaptive spatial clustering. In order to achieve clustering of millions or even billions of points adaptively, a new spatial clustering tool—HiSpatialCluster—was proposed, in which the CFSFDP (clustering by fast search and finding density peaks) idea to find cluster centers and the DBSCAN (density‐based spatial clustering of applications with noise) idea of density‐connect filtering for classification are introduced. The tool’s source codes and other resources have been released on Github, and experimental evaluation was performed through clustering massive taxi trajectory points and Flickr geotagged photos in Beijing, China. The spatial clustering results were compared with those through K‐means and DBSCAN as well. As a spatial clustering tool, HiSpatialCluster is expected to play a fundamental role in big geo‐data research. First, this tool enables clustering adaptively on massive point datasets with uneven spatial density distribution. Second, the density‐connect filter method is applied to generate homogeneous analysis units from geotagged data. Third, the tool is accelerated by both parallel CPU and GPU computing so that millions or even billions of points can be clustered efficiently.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[4]  R. Rastogi,et al.  CURE: An Efficient Clustering Algorithm for Large Databases , 1998, SIGMOD Conference.

[5]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[6]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[7]  M. Ankerst,et al.  OPTICS: ordering points to identify the clustering structure , 1999, ACM SIGMOD Conference.

[8]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[9]  Jiawei Han,et al.  Geographic Data Mining and Knowledge Discovery , 2001 .

[10]  Ickjai Lee,et al.  Multi-Level Clustering and its Visualization for Exploratory Spatial Analysis , 2002, GeoInformatica.

[11]  V. Estivill-Castro,et al.  Argument free clustering for large spatial point-data sets via boundary extraction from Delaunay Diagram , 2002 .

[12]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[13]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[14]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[15]  Ming-Syan Chen,et al.  Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging , 2005, IEEE Trans. Knowl. Data Eng..

[16]  Chenghu Zhou,et al.  A new approach to the nearest‐neighbour method to discover cluster features in overlaid spatial point processes , 2006, Int. J. Geogr. Inf. Sci..

[17]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[18]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[19]  Kyriakos Mouratidis,et al.  Geographic data mining and knowledge discovery: An overview , 2009 .

[20]  Chenghu Zhou,et al.  Detecting feature from spatial point processes using Collective Nearest Neighbor , 2009, Comput. Environ. Urban Syst..

[21]  Xing Xie,et al.  Mining city landmarks from blogs by graph modeling , 2009, ACM Multimedia.

[22]  Slava Kisilevich,et al.  P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos , 2010, COM.Geo '10.

[23]  Zhiguo Gong,et al.  Identifying points of interest by self-tuning clustering , 2011, SIGIR.

[24]  Yan Shi,et al.  An adaptive spatial clustering algorithm based on delaunay triangulation , 2011, Comput. Environ. Urban Syst..

[25]  Xing Xie,et al.  Discovering regions of different functions in a city using human mobility and POIs , 2012, KDD.

[26]  Peng Gao,et al.  Discovering Spatial Patterns in Origin‐Destination Mobility Data , 2012, Trans. GIS.

[27]  Chenghu Zhou,et al.  ACOMCD: A multiple cluster detection algorithm based on the spatial scan statistic and ant colony optimization , 2012, Comput. Stat. Data Anal..

[28]  Chenghu Zhou,et al.  Multi-scale decomposition of point process data , 2012, GeoInformatica.

[29]  Ickjai Lee,et al.  Mining Points-of-Interest Association Rules from Geo-tagged Photos , 2013, 2013 46th Hawaii International Conference on System Sciences.

[30]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[31]  Ickjai Lee,et al.  Exploration of geo-tagged photos through data mining approaches , 2014, Expert Syst. Appl..

[32]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[33]  Chen Xu,et al.  Detecting tourism destinations using scalable geospatial analysis based on cloud computing platform , 2015, Comput. Environ. Urban Syst..

[34]  Huy Quan Vu,et al.  Exploring the travel behaviors of inbound tourists to Hong Kong using geotagged photos. , 2015 .