论文信息 - Graphics Hardware based Efficient and Scalable Fuzzy C-Means Clustering

Graphics Hardware based Efficient and Scalable Fuzzy C-Means Clustering

The exceptional growth of graphics hardware in programmability and data processing speed in the past few years has fuelled extensive research in using it for general purpose computations more than just image-processing and gaming applications. We explore the use of graphics processors (GPU) to speedup the computations involved in Fuzzy c-means (FCM). FCM is an important iterative clustering algorithm, and usually performs better than k-means. But for large data sets it requires substantial amount of time, which limits its applicability. FCM is an iterative algorithm that involves linear computations and repeated summations. Moreover, there is little reuse of the same data over FCM iterations (i.e., the centre of the clusters change in each iteration) and these characteristics make it a good candidate to be mapped to the parallel processors in the GPU to gain speed. We look at efficient methods for processing input data, handling intermediate results within the GPU with reusability of shader programs and minimizing the use of GPU resources. Two previous implementations of FCM on the graphics-processing unit (GPU) are also analysed. Our implementation shows speed gains in computational time over two orders of magnitude when compared with a recent generation of CPU at certain experimental conditions. This computational time includes both the processing time in the GPU and the data transfer time from the CPU to the GPU.

[1] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[2] James M. Keller,et al. Incorporation of Non-euclidean Distance Metrics into Fuzzy Clustering on Graphics Processing Units , 2007, Analysis and Design of Intelligent Systems using Soft Computing Techniques.

[3] Manoranjan Dash,et al. Efficient K-Means Clustering Using Accelerated Graphics Processors , 2008, DaWaK.

[4] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[5] Chris Harris,et al. Iterative Solutions using Programmable Graphics Processing Units , 2005, FUZZ-IEEE.

[6] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[7] James C. Bezdek,et al. Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[8] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[10] Naga K. Govindaraju,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .