Cluster Validity Index for Big Data Based on Density Discriminant Analysis

The important factor for clustering unsupervised data is the Cluster Validity Index indicating appropriate number of clusters. The paper proposes the application of the unsupervised density discriminant analysis algorithm for cluster validation in the context of Big Data. In particular, the experiment was conducted to perform clustering tasks on big dataset by using centroid based clustering algorithm and apply unsupervised density discriminant analysis algorithm to find the most appropriate number of clusters. The performance evaluation was performed by means of processing time. The result shows that the time used to perform the clustering task depends on number of features and clusters.

[1]  Chidchanok Lursinsap,et al.  A Discrimination Analysis for Unsupervised Feature Selection via Optic Diffraction Principle , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[3]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[4]  José Cristóbal Riquelme Santos,et al.  An approach to validity indices for clustering techniques in Big Data , 2018, Progress in Artificial Intelligence.

[5]  K. alik,et al.  Validity index for clusters of different sizes and densities , 2011 .

[6]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[7]  Pulkit Kumar,et al.  A Big Data Analysis Framework Using Apache Spark and Deep Learning , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[8]  Tanvi Gupta,et al.  Clustering Validation of CLARA and K-Means Using Silhouette & DUNN Measures on Iris Dataset , 2019, 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon).

[9]  Worarat Krathu,et al.  A Density Discriminant Index for Cluster Validation , 2019, 2019 11th International Conference on Information Technology and Electrical Engineering (ICITEE).

[10]  Nittaya Kerdprasop,et al.  The Clustering Validity with Silhouette and Sum of Squared Errors , 2015 .

[11]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[12]  Borut Zalik,et al.  Validity index for clusters of different sizes and densities , 2011, Pattern Recognit. Lett..

[13]  Ronaldo Dias,et al.  A Review of Kernel Density Estimation with Applications to Econometrics , 2012, 1212.2812.