Optimizing Data Mining Workloads using Hardware Accelerators

Data mining is the process of finding useful and actionable patterns in large data sets. Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a tremendous increase in the size of the data being collected and analyzed. Data mining algorithms have been unable to scale up to these vast amounts of data, leading to significant performance degradation. Also, the enhancements in processor and system designs do not necessarily aid data mining workloads. In our previous work, we demonstrated that computational characteristics as well as data access requirements for data mining workloads are quite different than those of other common workloads. Therefore, there is a need to specifically address the limitations of accelerating data mining workloads. In this paper, we present a brief overview of the major challenges faced in data mining systems design. We first highlight important characteristics of these workloads. Then, we describe some initial designs and results for accelerating data mining algorithms using programmable hardware. Our results show that tremendous performance gains can be obtained by accelerating these workloads when compared to using traditional systems.

[1]  Alok N. Choudhary,et al.  Design of a hardware accelerator for density based clustering applications , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[2]  Alok Choudhary,et al.  Design and optimization of architectures for data intensive computing , 2005 .

[3]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Viktor K. Prasanna,et al.  An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing Systems , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[5]  Gokhan Memik,et al.  Performance Characterization of Data Mining Applications using MineBench , 2006 .

[6]  Berkin Özisikyilmaz,et al.  An Architectural Characterization Study of Data Mining and Bioinformatics Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.

[7]  James Theiler,et al.  Algorithmic transformations in the implementation of K- means clustering on reconfigurable hardware , 2001, FPGA '01.

[8]  Alok N. Choudhary,et al.  An FPGA Implementation of Decision Tree Classification , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[9]  Viktor K. Prasanna,et al.  Efficient hardware data mining with the Apriori algorithm on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[10]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[11]  Vipin Kumar,et al.  ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.