An FPGA Implementation of Decision Tree Classification

Data mining techniques are a rapidly emerging class of applications that have widespread use in several fields. One important problem in data mining is classification, which is the task of assigning objects to one of several predefined categories. Among the several solutions developed, decision tree classification (DTC) is a popular method that yields high accuracy while handling large datasets. However, DTC is a computationally intensive algorithm, and as data sizes increase, its running time can stretch to several hours. In this paper, we propose a hardware implementation of decision tree classification. We identify the compute-intensive kernel (Gini score computation) in the algorithm, and develop a highly efficient architecture, which is further optimized by reordering the computations and by using a bitmapped data structure. Our implementation on a Xilinx Virtex-II Pro FPGA platform (with 16 Gini units) provides up to 5.58times performance improvement over an equivalent software implementation

[1]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[2]  Gokhan Memik,et al.  Performance Characterization of Data Mining Applications using MineBench , 2006 .

[3]  Viktor K. Prasanna,et al.  Efficient hardware data mining with the Apriori algorithm on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[4]  Viktor K. Prasanna,et al.  An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing Systems , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[5]  Vipin Kumar,et al.  ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[6]  James Theiler,et al.  Algorithmic transformations in the implementation of K- means clustering on reconfigurable hardware , 2001, FPGA '01.

[7]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..