论文信息 - Efficient parallel data mining for association rules

Efficient parallel data mining for association rules

In this paper, we develop an algorithm, called PDM, to conduct parallel data mining for association rules. Consider a transaction as a collection of items, and a large itemset is a set of items such that the number of transactions containing it exceeds a pre-specilied threshold. PDM is so designed that the global set of large itemsets can be identified efficiently and the amount of inter-node data exchange required is minimized. SpecificaUy, with a given database partition, each processing node will collect (count ) information on each itemset from its local database efficiently via a hashing method. The information discovered by each node is next shared with other nodes via some communication schemes. Then, PDM employs a technique, called clue-andpoll, to address the uncertainty due to the partial knowledge collected at each node by judiciously selecting a small fraction of the itemsets for the exchange of count information among nodes, thus reducing the communication cost. The global set of large iternsets can hence be determined based on the aggregate count of itemsets. It is experimentally shown that PDM not only attains very good parallelization efficiencies, but also provides robust performance for various input patterns.

Philip S. Yu | Ming-Syan Chen | Jong Soo Park

[1] AgrawalRakesh,et al. Mining association rules between sets of items in large databases , 1993 .

[2] Jiawei Han,et al. Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[3] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4] Philip S. Yu,et al. Optimal NODUP All-to-All Broadcast Schemes in Distributed Computing Systems , 1994, IEEE Trans. Parallel Distributed Syst..

[5] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6] Tomasz Imielinski,et al. An Interval Classifier for Database Mining Applications , 1992, VLDB.

[7] Philip S. Yu,et al. An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[8] Jiawei Han,et al. Knowledge Discovery in Databases: An Attribute-Oriented Approach , 1992, VLDB.

[9] Christos Faloutsos,et al. Efficient Similarity Search In Sequence Databases , 1993, FODO.