Density-based clustering for data containing two types of points

When only one type of point is distributed in a region, clustered points can be seen as an anomaly. When two different types of points coexist in a region, they overlap at different places with various densities. In such cases, the meaning of a cluster of one type of point may be altered if points of the other type show different densities within the same cluster. If we consider the origins and destinations (OD) of taxicab trips, the clustering of both in the morning may indicate a transportation hub, whereas clustered origins and sparse destinations (a hot spot where taxis are in short supply) could suggest a densely populated residential area. This cannot be identified by previous clustering methods, so it is worthwhile studying a clustering method for two types of points. The concept of two-component clustering is first defined in this paper as a group containing two types of points, at least one of which exhibits clustering. We then propose a density-based method for identifying two-component clusters. The method is divided into four steps. The first estimates the clustering scale of the point data. The second transforms the point data into the 2D density domain, where the x and y axes represent the local density of each type of point around each point, respectively. The third determines the thresholds for extracting the clusters, and the fourth generates two-component clusters using a density-connectivity mechanism. The method is applied to taxicab trip data in Beijing. Three types of two-component clusters are identified: high-density origins and destinations, high-density origins and low-density destinations, and low-density origins and high-density destinations. The clustering results are verified by the spatial relationship between the cluster locations and their land-use types over different periods of the day.

[1]  Michal Daszykowski,et al.  Revised DBSCAN algorithm to cluster data with dense adjacent clusters , 2013 .

[2]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[3]  M. Kulldorff,et al.  An elliptic spatial scan statistic , 2006, Statistics in medicine.

[4]  Chenghu Zhou,et al.  ACOMCD: A multiple cluster detection algorithm based on the spatial scan statistic and ant colony optimization , 2012, Comput. Stat. Data Anal..

[5]  D. Massart,et al.  Looking for natural patterns in data: Part 1. Density-based approach , 2001 .

[6]  Renato Assunção,et al.  A Simulated Annealing Strategy for the Detection of Arbitrarily Shaped Spatial Clusters , 2022 .

[7]  Dino Pedreschi,et al.  Interactive visual clustering of large collections of trajectories , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[8]  Chenghu Zhou,et al.  DECODE: a new method for discovering clusters of different densities in spatial data , 2009, Data Mining and Knowledge Discovery.

[9]  Yang Yue,et al.  Identifying shopping center attractiveness using taxi trajectory data , 2011, TDMA '11.

[10]  Gang Pan,et al.  Mining the semantics of origin-destination flows using taxi traces , 2012, UbiComp '12.

[11]  J. Hancock,et al.  On the use of Ripley's K-function and its derivatives to analyze domain size. , 2009, Biophysical journal.

[12]  Chenghu Zhou,et al.  Detecting arbitrarily shaped clusters using ant colony optimization , 2011, Int. J. Geogr. Inf. Sci..

[13]  Pemetaan Jumlah Balita,et al.  Spatial Scan Statistic , 2014, Encyclopedia of Social Network Analysis and Mining.

[14]  Xiaofeng Wang,et al.  A Novel Density-Based Clustering Framework by Using Level Set Method , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Andrew W. Moore,et al.  Detection of spatial and spatio-temporal clusters , 2006 .

[16]  Andrew W. Moore,et al.  CHAPTER 16 – Methods for Detecting Spatial and Spatio-Temporal Clusters , 2006 .

[17]  Jing Li,et al.  A new hybrid method based on partitioning-based DBSCAN and ant clustering , 2011, Expert Syst. Appl..

[18]  Gyung-Leen Park,et al.  Analysis of the Passenger Pick-Up Pattern for Taxi Location Recommendation , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[19]  Zhaohui Wu,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1 Land-Use Classification Using Taxi GPS Traces , 2022 .

[20]  Chenghu Zhou,et al.  A new approach to the nearest‐neighbour method to discover cluster features in overlaid spatial point processes , 2006, Int. J. Geogr. Inf. Sci..

[21]  Swarup Roy,et al.  An Approach to Find Embedded Clusters Using Density Based Techniques , 2005, ICDCIT.

[22]  Wesam Ashour,et al.  Multi Density DBSCAN , 2011, IDEAL.

[23]  Peng Gao,et al.  Discovering Spatial Patterns in Origin‐Destination Mobility Data , 2012, Trans. GIS.

[24]  B. Ripley The Second-Order Analysis of Stationary Point Processes , 1976 .

[25]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[26]  Fahui Wang,et al.  Urban land uses and traffic 'source-sink areas': Evidence from GPS-enabled taxi data in Shanghai , 2012 .

[27]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[28]  Vijayalakshmi Atluri,et al.  Random Walks to Identify Anomalous Free-Form Spatial Scan Windows , 2008, IEEE Transactions on Knowledge and Data Engineering.

[29]  Chenghu Zhou,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .