Cluster Analysis on High-Dimensional Data: A Comparison of Density-based Clustering Algorithms

The effectiveness and efficiency of the existing cluster analysis methods are limited, especially when the referred data has high dimensions or when the clusters within the data are not well-separated and having different densities, sizes and shapes. Density-based clustering algorithms have been proven able to discovered clusters with those characteristics. Previous researchers which explored density-based clustering algorithms focused on the analyzing the parameters essential for creating meaningful spatial clusters. The aim of this paper is to provide a comparative study of three well know density-based clustering algorithms including DBSCAN, DENCLUE and LTKC. The merits of them were evaluated of their ability to cluster several high-dimensional artificial data. We concluded that each density-based data clustering algorithm has their individual merits for high- dimensional data. However, further research is needed in the application of the techniques to analyze other high-dimensional data, to permit a comprehensive evaluation of their respective strengths and limitations as powerful cluster analysis methods.

[1]  Andrea Tagarelli,et al.  Clustering Uncertain Data Via K-Medoids , 2008, SUM.

[2]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[3]  M. Tahar Kechadi,et al.  A New Hybrid Clustering Method for Reducing Very Large Spatio-temporal Dataset , 2011, ADMA.

[4]  Michela Bertolotto,et al.  Scalable 2-Pass Data Mining Technique for Large Scale Spatio-temporal Datasets , 2007, KES.

[5]  T. Golob,et al.  A Method for Relating Type of Crash to Traffic Flow Characteristics on Urban Freeways , 2002 .

[6]  Siti Zaiton Mohd Hashim,et al.  Triangular kernel nearest neighbor based clustering for pattern extraction in spatio-temporal database , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[7]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[8]  M. Tahar Kechadi,et al.  A Clustering-Based Data Reduction for Very Large Spatio-Temporal Datasets , 2010, ADMA.

[9]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[10]  Anil K. Jain,et al.  Data Clustering: A User's Dilemma , 2005, PReMI.

[11]  Hans-Yngve Berg,et al.  A pattern analysis of traffic crashes fatal to older drivers. , 2009, Accident; analysis and prevention.

[12]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[13]  M. Parimala,et al.  A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases , 2011 .

[14]  Shashi Shekhar,et al.  Data Mining and Visualization of Twin-Cities Traffic Data , 2001 .

[15]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[16]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[17]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[18]  Daniel A. Keim,et al.  A General Approach to Clustering in Large Databases with Noise , 2003, Knowledge and Information Systems.

[19]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[20]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[21]  Daniel Zeng,et al.  Prospective spatio-temporal data analysis for security informatics , 2005, Proceedings. 2005 IEEE Intelligent Transportation Systems, 2005..

[22]  Tessa K Anderson,et al.  Kernel density estimation and K-means clustering to profile road accident hotspots. , 2009, Accident; analysis and prevention.

[23]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..