Parallelization of data mining algorithms for multicore processors

The article describes a approach of parallel data mining algorithms to be executed on multicore processors of various architecture. The suggested method presents an algorithm as a consequence of pure functions with unified interfaces. For parallel execution additional functions are introduced to share data and models between the parallel threads. Besides such functions allow to obtain various parallel algorithm structures and implement various strategies of execution for different environment conditions. Application of the described method is illustrated through algorithm Naïve Bayes.

[1]  Sujni Paul,et al.  Parallel and Distributed Data Mining , 2011 .

[2]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[3]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[4]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[5]  Kimito Funatsu New Fundamental Technologies in Data Mining , 2011 .

[6]  Henk Barendregt,et al.  The Lambda Calculus: Its Syntax and Semantics , 1985 .

[7]  David Wai-Lok Cheung,et al.  Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors , 1998, SPAA '98.

[8]  Richard Kufrin,et al.  Decision trees on parallel processors , 1997, Parallel Processing for Artificial Intelligence 3.

[9]  Mohammed J. Zaki,et al.  Parallel classification for data mining on shared-memory multiprocessors , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[10]  Nittaya Kerdprasop,et al.  Parallelization of K-means clustering on multi-core processors , 2010 .

[11]  Ivan Kholod Framework for multi threads execution of data mining algorithms , 2015, 2015 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference (EIConRusNW).

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Masaru Kitsuregawa,et al.  Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[14]  A. Church,et al.  Some properties of conversion , 1936 .

[15]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.