Towards a Cost-Effective Parallel Data Mining Approach

Massive rule induction has recently emerged as one of the powerful data mining techniques. The problem is known to be exponential in the size of the attributes, and given its ever increasing use, can greatly benefit from parallelization. In this paper, we study cost-effective approaches to parallelize rule generation algorithms. In particular, we consider the propositional rule generation algorithm of the Discovery Board system, and present our design and implementation of a parallel algorithm for the same task. We then present some early performance results of our parallelization scheme on hardware and software distributed shared memory multiprocessors.