A Parallel Elastic Net Clustering Algorithm

The elastic net clustering algorithm (ENCA) can typically provide an effective way for classifying non-linearly separable data. However, the computation time it takes will be significantly increased for large datasets. To deal with this issue, a parallel version of the ENCA, built on the Apache Spark framework, called parallel elastic net clustering algorithm (PENCA), is presented in this paper. To evaluate the performance of the proposed algorithm, it is compared with ENCA and two well-known clustering algorithms, k-means and genetic k-means algorithm (GKA). The results show that PENCA not only outperforms k-means and GKA in terms of the accuracy rate, it also provides an efficient way to reduce the response time of ENCA-based clustering algorithms for large-scale datasets.

[1]  Inderjit S. Dhillon,et al.  Kernel k-means: spectral clustering and normalized cuts , 2004, KDD.

[2]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[3]  Tsung-Hsien Lin,et al.  Automatic elastic net clustering algorithm , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[4]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[5]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[7]  Richard Szeliski,et al.  An Analysis of the Elastic Net Approach to the Traveling Salesman Problem , 1989, Neural Computation.

[8]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[9]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[10]  Chun-Wei Tsai,et al.  An Elastic Net Clustering Algorithm for Non-linearly Separable Data , 2013, ACIIDS.