论文信息 - Hash Partitioned apriori in Parallel and Distributed Data Mining Environment with Dynamic Data Allocation Approach

Hash Partitioned apriori in Parallel and Distributed Data Mining Environment with Dynamic Data Allocation Approach

Parallel system is mainly composed of parallel algorithms which are cost optimal. In this paper a parallel algorithm the hash partitioned apriori (HPA) is taken into consideration. HPA partitions the candidate itemsets among processors using a hash function, like the hash join in relational databases. HPA effectively utilizes the whole memory space of all the processors, hence it works well for large scale data mining in a parallel and distributed environment. The optimization technique of dynamic data allocation is discussed for the execution of this application. This technique is applied in a parallel and distributed environment. Writing parallel data mining algorithms in a distributed environment is a non-trivial task. The main purpose of the proposed method is to meet certain challenges associated with parallel and distributed data mining such as (i) minimizing I/O (ii) Increasing processing speed (iii) Communication cost.

Sujni Paul | V. Saravanan | V. Saravanan | Sujni Paul

[1] David Taniar,et al. ODAM: An optimized distributed association rule mining algorithm , 2004, IEEE Distributed Systems Online.

[2] Tomasz Imielinski,et al. Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[3] Fred W. Glover,et al. Advances in analytics: Integrating dynamic data mining with simulation optimization , 2007, IBM J. Res. Dev..

[4] Srinivasan Parthasarathy,et al. Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[5] Frans Coenen,et al. T-trees, vertical partitioning and distributed association rule mining , 2003, Third IEEE International Conference on Data Mining.

[6] Masaru Kitsuregawa,et al. Hash Based Parallel Mining Algorithms for Mining Association Rules , 1996 .

[7] Masaru Kitsuregawa,et al. Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[8] Alok N. Choudhary,et al. A parallel scalable infrastructure for OLAP and data mining , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[9] David B. Skillicorn. Parallel frequent set counting , 2002, Parallel Comput..

[10] Rakesh Agrawal,et al. Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[11] Giuseppe Psaila,et al. A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[12] Johannes Gehrke,et al. Mining Very Large Databases , 1999, Computer.

[13] Maged M. Michael. Scalable lock-free dynamic memory allocation , 2004, PLDI '04.

[14] Gregory Piatetsky-Shapiro,et al. Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[15] Srinivasan Parthasarathy,et al. Parallel algorithms for mining frequent structural motifs in scientific data , 2004, ICS '04.

[16] Mohammed J. Zaki. Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[17] Masato Oguchi,et al. Implementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments , 2001 .

[18] A. Knobbe,et al. A Parallel Data Mining Architecture for Massive Data Sets , 1999 .