GPIC- GPU Power Iteration Cluster

This work presents a new clustering algorithm, the GPIC, a Graphics Processing Unit (GPU) accelerated algorithm for Power Iteration Clustering (PIC). Our algorithm is based on the original PIC proposal, adapted to take advantage of the GPU architecture, maintining the algorith original properties. The proposed method was compared against the serial and parallel Spark implementation, achieving a considerable speed-up in the test problems.

[1]  Lida Xu,et al.  Internet of Things for Enterprise Systems of Modern Manufacturing , 2014, IEEE Transactions on Industrial Informatics.

[2]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[3]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[4]  William W. Cohen,et al.  Power Iteration Clustering , 2010, ICML.

[5]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[6]  Meichun Hsu,et al.  Clustering billions of data points using GPUs , 2009, UCHPC-MAW '09.

[7]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[8]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[9]  Fei Wang,et al.  Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism , 2013, IDEAL.

[10]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[11]  Chi-Hoon Lee,et al.  On Data Clustering Analysis: Scalability, Constraints, and Validation , 2002, PAKDD.

[12]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[13]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[14]  John Langford,et al.  Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[15]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[16]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[17]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[18]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[19]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[20]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[21]  Roy H. Campbell,et al.  A Parallel Implementation of K-Means Clustering on GPUs , 2008, PDPTA.

[22]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[23]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[24]  Jiming Liu,et al.  Speeding up K-Means Algorithm by GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[25]  Aditya Bhaskara,et al.  Distributed Balanced Clustering via Mapping Coresets , 2014, NIPS.

[26]  Rafael Sachetto Oliveira,et al.  G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering , 2013, ICCS.

[27]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[28]  Yao Zhang,et al.  Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[29]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[30]  Amitava Datta,et al.  A novel algorithm for fast and scalable subspace clustering of high-dimensional data , 2015, Journal of Big Data.