Parallelization of Partitioning Around Medoids (PAM) in K-Medoids Clustering on GPU

K-medoids clustering is categorized as partitional clustering. K-medoids offers better result when dealing with outliers and arbitrary distance metric also in the situation when the mean or median does not exist within data. However, k-medoids suffers a high computational complexity. Partitioning Around Medoids (PAM) has been developed to improve k-medoids clustering, consists of build and swap steps and uses the entire dataset to find the best potential medoids. Thus, PAM produces better medoids than other algorithms. This research proposes the parallelization of PAM in k-medoids clustering on GPU to reduce computational time at the swap step of PAM. The parallelization scheme utilizes shared memory, reduction algorithm, and optimization of the thread block configuration to maximize the occupancy. Based on the experiment result, the proposed parallelized PAM k-medoids is faster than CPU and Matlab implementation and efficient for large dataset.

[1]  Jun Yue,et al.  Parallel K-Medoids++ Spatial Clustering Algorithm Based on MapReduce , 2016, ArXiv.

[2]  Surya S. Durbha,et al.  High resolution disaster data clustering using Graphics Processing Units , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[3]  Jiongmin Zhang,et al.  Parallel K-Medoids clustering algorithm based on Hadoop , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[4]  Preeti Arora,et al.  Analysis of K-Means and K-Medoids Algorithm For Big Data , 2016 .

[5]  Yambem Jina Chanu,et al.  A Survey on Image Segmentation Methods using Clustering Techniques , 2017, European Journal of Engineering and Technology Research.

[6]  Jiming Liu,et al.  Speeding up K-Means Algorithm by GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[7]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[8]  Bin Chen,et al.  Parallel K-Medoids Improved Algorithm Based on MapReduce , 2018, 2018 Sixth International Conference on Advanced Cloud and Big Data (CBD).

[9]  Jae-Gil Lee,et al.  PAMAE: Parallel k-Medoids Clustering with High Accuracy and Efficiency , 2017, KDD.

[10]  Steve Mann,et al.  Using graphics devices in reverse: GPU-based Image Processing and Computer Vision , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[11]  M. Omair Shafiq,et al.  A Parallel K-Medoids Algorithm for Clustering based on MapReduce , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[12]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[13]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[14]  Tshilidzi Marwala,et al.  PAM-lite: fast and accurate k-medoids clustering for massive datasets , 2019, 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA).

[15]  Seema Wazarkar,et al.  A survey on image data analysis through clustering techniques for real world applications , 2018, J. Vis. Commun. Image Represent..

[16]  Alistair A. Young,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.

[17]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[18]  Saikishor Jangiti,et al.  Incremental MapReduce for K-Medoids Clustering of Big Time-Series Data , 2018, 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI).

[19]  Bo Liu,et al.  Survey on clustering-based image segmentation techniques , 2016, 2016 IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[20]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[21]  Ying-ting Zhu,et al.  K-medoids clustering based on MapReduce and optimal search of medoids , 2014, 2014 9th International Conference on Computer Science & Education.

[22]  Shafiq M. Omair,et al.  A Parallel K-Medoids Algorithm for Clustering based on MapReduce , 2016 .

[23]  Meichun Hsu,et al.  Clustering billions of data points using GPUs , 2009, UCHPC-MAW '09.

[24]  Dimitris K. Tasoulis,et al.  Unsupervised Clustering of Bioinformatics Data , 2004 .

[25]  Chin-Teng Lin,et al.  A review of clustering techniques and developments , 2017, Neurocomputing.

[26]  Masayu Leylia Khodra,et al.  Parallelized k-means clustering by exploiting instruction level parallelism at low occupancy , 2017, 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE).

[27]  Iwan Tri Riyadi Yanto,et al.  Histogram Thresholding for Automatic Color Segmentation Based on k-means Clustering , 2016, SCDM.

[28]  Russ B. Altman,et al.  CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms , 2011, Bioinform..

[29]  Bhanukiran Vinzamuri,et al.  A Survey of Partitional and Hierarchical Clustering Algorithms , 2018, Data Clustering: Algorithms and Applications.

[30]  Antonello Rizzi,et al.  Efficient Approaches for Solving the Large-Scale k-Medoids Problem: Towards Structured Data , 2017, IJCCI.

[31]  Zhenming Sun,et al.  PAM spatial clustering algorithm research based on CUDA , 2016, 2016 24th International Conference on Geoinformatics.

[32]  Xingang Wang A Survey of Clustering Algorithms Based on Parallel Mechanism , 2018 .

[33]  John MacCuish,et al.  Clustering in Bioinformatics and Drug Discovery , 2010 .