Direct Candidates Generation: A Novel Algorithm for Discovering Complete Share-Frequent Itemsets

The value of the itemset share is one way of evaluating the magnitude of an itemset. From business perspective, itemset share values reflect more the significance of itemsets for mining association rules in a database. The Share-counted FSM (ShFSM) algorithm is one of the best algorithms which can discover all share-frequent itemsets efficiently. However, ShFSM wastes the computation time on the join and the prune steps of candidate generation in each pass, and generates too many useless candidates. Therefore, this study proposes the Direct Candidates Generation (DCG) algorithm to directly generate candidates without the prune and the join steps in each pass. Moreover, the number of candidates generated by DCG is less than that by ShFSM. Experimental results reveal that the proposed method performs significantly better than ShFSM.

[1]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[2]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[3]  Chin-Chen Chang,et al.  A Fast Algorithm for Mining Share-Frequent Itemsets , 2005, APWeb.

[4]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[5]  Jiawei Han,et al.  Profit Mining: From Patterns to Actions , 2002, EDBT.

[6]  Howard J. Hamilton,et al.  Extracting Share Frequent Itemsets with Infrequent Subsets , 2003, Data Mining and Knowledge Discovery.

[7]  Nick Cercone,et al.  Share Based Measures for Itemsets , 1997, PKDD.

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Ke Wang,et al.  Mining frequent item sets by opportunistic projection , 2002, KDD.

[10]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[11]  Chinchen Chang,et al.  EFFICIENT ALGORITHMS FOR MINING SHARE-FREQUENT ITEMSETS , 2005 .

[12]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[13]  Nick Cercone,et al.  Mining Association Rules from Market Basket Data using Share Measures and Characterized Itemsets , 1998, Int. J. Artif. Intell. Tools.

[14]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[15]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[16]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[17]  Howard J. Hamilton,et al.  Parametric Algorithms for Mining Share Frequent Itemsets , 2001, Journal of Intelligent Information Systems.

[18]  Howard J. Hamilton,et al.  Algorithms for Mining Share Frequent Itemsets Containing Infrequent Subsets , 2000, PKDD.

[19]  Yanchun Zhang,et al.  Web Technologies Research and Development - APWeb 2005, 7th Asia-Pacific Web Conference, Shanghai, China, March 29 - April 1, 2005, Proceedings , 2005, APWeb.

[20]  Matthias Jarke,et al.  Advances in Database Technology — EDBT 2002 , 2002, Lecture Notes in Computer Science.

[21]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.