Hash Partitioned apriori in Parallel and Distributed Data Mining Environment with Dynamic Data Allocation Approach

Parallel system is mainly composed of parallel algorithms which are cost optimal. In this paper a parallel algorithm the hash partitioned apriori (HPA) is taken into consideration. HPA partitions the candidate itemsets among processors using a hash function, like the hash join in relational databases. HPA effectively utilizes the whole memory space of all the processors, hence it works well for large scale data mining in a parallel and distributed environment. The optimization technique of dynamic data allocation is discussed for the execution of this application. This technique is applied in a parallel and distributed environment. Writing parallel data mining algorithms in a distributed environment is a non-trivial task. The main purpose of the proposed method is to meet certain challenges associated with parallel and distributed data mining such as (i) minimizing I/O (ii) Increasing processing speed (iii) Communication cost.

[1]  David Taniar,et al.  ODAM: An optimized distributed association rule mining algorithm , 2004, IEEE Distributed Systems Online.

[2]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[3]  Fred W. Glover,et al.  Advances in analytics: Integrating dynamic data mining with simulation optimization , 2007, IBM J. Res. Dev..

[4]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[5]  Frans Coenen,et al.  T-trees, vertical partitioning and distributed association rule mining , 2003, Third IEEE International Conference on Data Mining.

[6]  Masaru Kitsuregawa,et al.  Hash Based Parallel Mining Algorithms for Mining Association Rules , 1996 .

[7]  Masaru Kitsuregawa,et al.  Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[8]  Alok N. Choudhary,et al.  A parallel scalable infrastructure for OLAP and data mining , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[9]  David B. Skillicorn Parallel frequent set counting , 2002, Parallel Comput..

[10]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[11]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[12]  Johannes Gehrke,et al.  Mining Very Large Databases , 1999, Computer.

[13]  Maged M. Michael Scalable lock-free dynamic memory allocation , 2004, PLDI '04.

[14]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[15]  Srinivasan Parthasarathy,et al.  Parallel algorithms for mining frequent structural motifs in scientific data , 2004, ICS '04.

[16]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[17]  Masato Oguchi,et al.  Implementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments , 2001 .

[18]  A. Knobbe,et al.  A Parallel Data Mining Architecture for Massive Data Sets , 1999 .