The GridOPTICS clustering algorithm

The OPTICS algorithm is a hierarchical density-based clustering method. It creates reachability plots to identify all clusters in the point set. Nevertheless, it has limitation, namely it is very slow for large data sets. We introduce the GridOPTICS algorithm, which builds a grid structure to reduce the number of data points, then it applies the OPTICS clustering algorithm on the grid structure. In order to get the clusters, the algorithm uses the reachability plots of the grid structure, then it determines to which cluster the original input points belong. The experimental results show that our new algorithm is faster than the OPTICS, the speed-up can be one or two orders of magnitude or more, which depends mainly on the τ parameter of the GridOPTICS algorithm. At the end of the article, we give some advice to which point set you can apply the GridOPTICS algorithm. Keyword: clustering, large data set, OPTICS, grid

[1]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[2]  Zhang Wei,et al.  A grid clustering algorithm based on reference and density , 2005 .

[3]  Yi Li,et al.  Ordering Grids to Identify the Clustering Structure , 2007, ISNN.

[4]  Hans-Peter Kriegel,et al.  Multi-step density-based clustering , 2005, Knowledge and Information Systems.

[5]  Elke Achtert,et al.  DeLi-Clu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking , 2006, PAKDD.

[6]  Wei-keng Liao,et al.  Scalable parallel OPTICS data clustering using graph algorithmic techniques , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[8]  Daniel A. Keim,et al.  Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering , 1999, VLDB.

[9]  Navneet Kaur,et al.  Grid Density Based Clustering Algorithm , 2013 .

[10]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[11]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Zhiyong Lu,et al.  Automatic Extraction of Clusters from Hierarchical Clustering Representations , 2003, PAKDD.

[13]  Wesam M. Ashour,et al.  EOPTICS “Enhancement Ordering Points to Identify the Clustering Structure” , 2012 .

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Shiwei Tang,et al.  A New Fast Clustering Algorithm Based on Reference and Density , 2003, WAIM.

[16]  Aidong Zhang,et al.  WaveCluster: a wavelet-based clustering approach for spatial data in very large databases , 2000, The VLDB Journal.

[17]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[18]  Hans-Peter Kriegel,et al.  Fast Hierarchical Clustering Based on Compressed Data and OPTICS , 2000, PKDD.

[19]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[20]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[21]  Pasi Fränti,et al.  A Dynamic local search algorithm for the clustering problem , 2002 .

[22]  Jin Wang,et al.  G-DBSCAN: An Improved DBSCAN Clustering Method Based On Grid , 2014 .

[23]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[24]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[25]  Wang Peng,et al.  Grid-based DBSCAN Algorithm with Referential Parameters , 2012 .

[26]  Johannes Schneider,et al.  Fast parameterless density-based clustering via random projections , 2013, CIKM.

[27]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[28]  Chengqi Zhang,et al.  Enhancing grid-density based clustering for high dimensional data , 2011, J. Syst. Softw..