Parallel Implementation of a Density-Based Stream Clustering Algorithm Over a GPU Scheduling System

Graphics Processing Units (GPUs) are used together with the CPU to accelerate a wide range of general purpose applications or scientific computations. The highly parallel architecture of the GPU consists of hundreds of cores optimized for parallel performance. Applications taking benefit of the GPU architecture have to be implemented according to the GPU parallel concept. An algorithm which follows a sequential work flow, has to be redesigned to achieve good performance on the GPU device. DenStream is a recent stream clustering algorithm that consists of two main parts. The online part summarizes data from the data stream, and builds micro clusters, while the offline part generates the final clustering using density-based clustering. In this work, we present a GPU-based efficient implementation of DenStream called (G-DenStream). G-DenStream is faster than DenStream, especially when the dimensionality of the streaming dataset increases, while keeping the quality of the reflected clustering as it is. The implementations in this work achieve palatalization of both online and offline parts and test the performance and the utilization on the GPU.

[1]  Michael Mccool Signal Processing and General-Purpose Computing and GPUs [Exploratory DSP] , 2007, IEEE Signal Processing Magazine.

[2]  Jianbin Fang,et al.  An Auto-tuning Solution to Data Streams Clustering in OpenCL , 2011, 2011 14th IEEE International Conference on Computational Science and Engineering.

[3]  Christian Trefftz,et al.  Memory-efficient implementation of a graphics processor-based cluster detection algorithm for large spatial databases , 2010, 2010 IEEE International Conference on Electro/Information Technology.

[4]  Christian Böhm,et al.  Density-based clustering using graphics processors , 2009, CIKM.

[5]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[6]  Hiroaki Kobayashi,et al.  Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing , 2006, The Journal of Supercomputing.

[7]  Sotiris Ioannidis,et al.  Gnort: High Performance Network Intrusion Detection Using Graphics Processors , 2008, RAID.

[8]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[10]  Geoffrey C. Fox,et al.  Iterative statistical kernels on contemporary GPUs , 2013, Int. J. Comput. Sci. Eng..

[11]  David Taniar,et al.  Computational Science and Its Applications – ICCSA 2013 , 2013, Lecture Notes in Computer Science.

[12]  Thomas Seidl,et al.  Using a Multitasking GPU Environment for Content-Based Similarity Measures of Big Data , 2013, ICCSA.

[13]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[14]  Meichun Hsu,et al.  Clustering billions of data points using GPUs , 2009, UCHPC-MAW '09.

[15]  Manoranjan Dash,et al.  Efficient K-Means Clustering Using Accelerated Graphics Processors , 2008, DaWaK.