Linear density-based clustering with a discrete density model

Density-based clustering techniques are used in a wide range of data mining applications. One of their most attractive features con- sists in not making use of prior knowledge of the number of clusters that a dataset contains along with their shape. In this paper we propose a new algorithm named Linear DBSCAN (Lin-DBSCAN), a simple approach to clustering inspired by the density model introduced with the well known algorithm DBSCAN. Designed to minimize the computational cost of density based clustering on geospatial data, Lin-DBSCAN features a linear time complexity that makes it suitable for real-time applications on low-resource devices. Lin-DBSCAN uses a discrete version of the density model of DBSCAN that takes ad- vantage of a grid-based scan and merge approach. The name of the algorithm stems exactly from its main features outlined above. The algorithm was tested with well known data sets. Experimental results prove the efficiency and the validity of this approach over DBSCAN in the context of spatial data clustering, enabling the use of a density-based clustering technique on large datasets with low computational cost.

[1]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[2]  Roland Siegwart,et al.  Starleth: A compliant quadrupedal robot for fast, efficient, and versatile locomotion , 2012 .

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  D.K. Bhattacharyya,et al.  An improved sampling-based DBSCAN for large spatial databases , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[5]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[6]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[7]  Ling Tian,et al.  A Parallel DBSCAN Algorithm Based on Spark , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[8]  Chun-Rong Huang,et al.  Lane detection in surveillance videos using vector-based hierarchy clustering and density verification , 2015, 2015 14th IAPR International Conference on Machine Vision Applications (MVA).

[9]  Thomas B. Moeslund,et al.  Crowd analysis by using optical flow and density based clustering , 2010, 2010 18th European Signal Processing Conference.

[10]  Marzena Kryszkiewicz,et al.  TI-DBSCAN: Clustering with DBSCAN by Means of the Triangle Inequality , 2010, RSCTC.

[11]  Morris Riedel,et al.  Automatic Object Detection Using DBSCAN for Counting Intoxicated Flies in the FLORIDA Assay , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[12]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[13]  Di Ma,et al.  MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[14]  Yufei Tao,et al.  DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation , 2015, SIGMOD Conference.

[15]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[16]  Luca Maria Gambardella,et al.  Kinect-based people detection and tracking from small-footprint ground robots , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Poonam Goyal,et al.  Exact, Fast and Scalable Parallel DBSCAN for Commodity Platforms , 2017, ICDCN.

[18]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[19]  Pradeep Dubey,et al.  Pardicle: Parallel Approximate Density-Based Clustering , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[21]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[22]  Wei-keng Liao,et al.  A new scalable parallel DBSCAN algorithm using the disjoint-set data structure , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Khaled Mahar,et al.  Using grid for accelerating density-based clustering , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[24]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[25]  M. Vazirgiannis,et al.  Clustering validity assessment using multi representatives , 2002 .

[26]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[27]  Soroush Falahati OpenNI Cookbook , 2013 .

[28]  Barton P. Miller,et al.  Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[29]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[30]  Li Ma,et al.  MRG-DBSCAN: An Improved DBSCAN Clustering Method Based on Map Reduce and Grid , 2015 .

[31]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[32]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[33]  Lian Duan,et al.  A Local Density Based Spatial Clustering Algorithm with Noise , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[34]  Elke Achtert,et al.  Evaluation of Clusterings -- Metrics and Visual Support , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[35]  Bing Liu,et al.  A Fast Density-Based Clustering Algorithm for Large Databases , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[36]  Xiaochun Cao,et al.  Diversity-induced Multi-view Subspace Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ling Shao,et al.  Real-Time Superpixel Segmentation by DBSCAN Clustering Algorithm , 2016, IEEE Transactions on Image Processing.

[38]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[39]  Stefan Conrad,et al.  Clustering approaches for data with missing values: Comparison and evaluation , 2010, 2010 Fifth International Conference on Digital Information Management (ICDIM).

[40]  Dilip B. Kotak,et al.  GRIDBSCAN: GRId Density-Based Spatial Clustering of Applications with Noise , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[41]  Michalis Vazirgiannis,et al.  A density-based cluster validity approach using multi-representatives , 2008, Pattern Recognit. Lett..

[42]  Morris Riedel,et al.  HPDBSCAN: highly parallel DBSCAN , 2015, MLHPC@SC.

[43]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[44]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[45]  Teng-Sheng Moh,et al.  DBSCAN on Resilient Distributed Datasets , 2015, 2015 International Conference on High Performance Computing & Simulation (HPCS).

[46]  Surendra Byna,et al.  BD-CATS: big data clustering at trillion particle scale , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[47]  Cheng-Fa Tsai,et al.  GF-DBSCAN: a new efficient and effective data clustering technique for large databases , 2009 .

[48]  Dimitrios C. Tselios,et al.  Parallelizing DBSCaN Algorithm Using MPI , 2016, 2016 IEEE 25th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE).

[49]  Eréndira Rendón,et al.  A comparison of internal and external cluster validation indexes , 2011 .

[50]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[51]  R Nedunchezhian,et al.  Evaluation of three Simple Imputation Methods for Enhancing Preprocessing of Data with Missing Values , 2011 .

[52]  Arthur Zimek,et al.  Density-Based Clustering Validation , 2014, SDM.

[53]  A. Rama Mohan Reddy,et al.  A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method , 2016, Pattern Recognit..

[54]  Jianfei Cai,et al.  Fast and automatic body circular measurement based on a single kinect , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[55]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[56]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[57]  Hans-Peter Kriegel,et al.  A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.