Framework for Mapping Data Mining Applications on GPUs

Data mining algorithms are expensive by nature, but when dealing with today's dataset sizes, they are becoming even more slow and hard to use. Previous work has focused on parallelizing data mining algorithms on different architectures, and more recently, applications are starting to take advantage of the massive computation power and high bandwidth offered by GPUs. However there has been almost no prior work in offering a general methodology for parallelizing all types of data mining applications on hybrid architectures. This paper presents a framework for fast and efficient parallelization of data mining algorithms on GPU systems. The framework implements I/O transfer models that deal with the huge amount of data entries which are processed by this type of algorithms, all with numerous dependencies. Also the framework allows users to specify data requirements for each task so that the data scheduler can map efficiently each task on a GPU node and on a block in each of these processors improving the overall performance of the algorithm with around 20%.

[1]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[2]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[3]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[4]  Eric Li,et al.  Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[5]  Anselmo Lastra,et al.  GPGP: General Purpose Computation using Graphics Processors , 2004 .

[6]  Tikara Hosino,et al.  Solving k-Nearest Neighbor Problem on Multiple Graphics Processors , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[7]  Bingsheng He,et al.  Parallel Data Mining on Graphics Processors , 2011 .

[8]  Christian Engelmann,et al.  Blue Gene/L Log Analysis and Time to Interrupt Estimation , 2009, 2009 International Conference on Availability, Reliability and Security.

[9]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[10]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[11]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[12]  Austin Carpenter,et al.  CUSVM: A CUDA IMPLEMENTATION OF SUPPORT VECTOR CLASSIFICATION AND REGRESSION , 2009 .

[13]  Ji-Bo Wang,et al.  GPU Accelerated Support Vector Machines for Mining High-Throughput Screening Data , 2009, J. Chem. Inf. Model..

[14]  Hanan Samet,et al.  A Fast Similarity Join Algorithm Using Graphics Processing Units , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Sean Philip Ponce Towards Algorithm Transformation for Temporal Data Mining on GPU , 2009 .

[16]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[17]  Michel Barlaud,et al.  Fast k nearest neighbor search using GPU , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Anthony K. H. Tung,et al.  Scalable Clustering Using Graphics Processors , 2006, WAIM.

[19]  Manoranjan Dash,et al.  Efficient K-Means Clustering Using Accelerated Graphics Processors , 2008, DaWaK.

[20]  Frank Mueller,et al.  GPU-Accelerated Text Mining , 2009 .

[21]  Keechul Jung,et al.  GPU implementation of neural networks , 2004, Pattern Recognit..

[22]  Masaru Kitsuregawa,et al.  Tree Structure Based Parallel Frequent Pattern Mining on PC Cluster , 2003, DEXA.

[23]  Dimitrios Gunopulos,et al.  An Efficient Density-based Approach for Data Mining Tasks , 2004, Knowledge and Information Systems.

[24]  Wei-keng Liao,et al.  Parallel Data Mining Algorithms for Association Rules and Clustering , 2007 .