Interactive Kernel Dimension Alternative Clustering on GPUs

Machine learning has seen tremendous growth in recent years thanks to two key advances in technology: massive data generation and highly-parallel accelerator architectures. The rate that data is being generated is exploding across multiple domains, including medical research, environmental science, web-search, and e-commerce. Many of these advances have benefited from emergent web-based applications, and improvements in data storage and sensing technologies. Innovations in parallel accelerator hardware, such as GPUs, has made it possible to process massive amounts of data in a timely fashion. Given these advanced data acquisition technology and hardware, machine learning researchers are equipped to generate and sift through much larger and complex datasets quickly. In this work, we focus on accelerating Kernel Dimension Alternative Clustering algorithms using GPUs. We conduct a thorough performance analysis by using both synthetic and real-world datasets, while also modifying both the structure of the data, and the size of the datasets. Our GPU implementation reduces execution time from minutes to seconds, which enables us to develop a web-based application for users to, interactively, view alternative clustering solutions.

[1]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[2]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[3]  Xiangyu Li,et al.  Iterative Spectral Method for Alternative Clustering , 2018, AISTATS.

[4]  Michael I. Jordan,et al.  Iterative Discovery of Multiple AlternativeClustering Views , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jennifer G. Dy,et al.  GPU-Accelerated Feature Selection for Outlier Detection Using the Local Kernel Density Ratio , 2012, 2012 IEEE 12th International Conference on Data Mining.

[6]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[7]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[8]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[9]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[10]  David Kaeli,et al.  Heterogeneous Computing with OpenCL 2.0 , 2015 .

[11]  Xiangyu Li,et al.  An interactive big data processing/visualization framework , 2017, 2017 IEEE MIT Undergraduate Research Technology Conference (URTC).

[12]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[13]  Michael I. Jordan,et al.  Dimensionality Reduction for Spectral Clustering , 2011, AISTATS.