论文信息 - Sampling Based N-Hash Algorithm for Searching Frequent Itemset

Sampling Based N-Hash Algorithm for Searching Frequent Itemset

Searching frequent itemsets is the critical problem in generating association rules in data mining, classic Hash-based technique, put forward by J. S. Park, for searching frequent itemsets has two shortcomings: one is that it is difficult to choose an appropriate hash function; the other is that it is liable to cause hash colliding. In order to solve the two problems, Chen Y.M. proposed N-Hash algorithm which needn't to choose hash function and avoided hash colliding. In this paper, the sampling technique is employed to improve the efficiency of N-Hash algorithm.

Yong-ming Chen | Mei-ling Zhu | Yong-ming Chen | Mei-ling Zhu

[1] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .

[2] Philip S. Yu,et al. An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[3] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4] He Xiao. Two Dimension Hash Algorithm of Large Itemsets of Apriori Algorithm , 2003 .

[5] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[6] Ji Hyea Han,et al. Data Mining : Concepts and Techniques 2 nd Edition Solution Manual , 2005 .

[7] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.