Clustering Frequent Itemsets Based on Generators

How to reduce the number of frequent itemsets effectively is a hot topic in data mining research. Clustering frequent itemsets is one solution to the problem. Since generators are lossless concise representations of all frequent itemsets, clustering generators is equivalent to clustering all frequent itemsets. This paper proposes a new algorithm for clustering frequent itemsets based on generators. Firstly, based on minimum description length principle, the rationality of clustering generators is discussed. Secondly, the pruning strategies and mining algorithm for generators are proposed. Finally, based on a new similarity criterion of frequent itemsets, the clustering algorithm is presented. Experimental results show that the proposed method can not only reduce the number of discovered itemsets, but also is efficient.

[1]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[2]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[3]  Kuen-Fang Jea,et al.  Discovering frequent itemsets by support approximation and itemset clustering , 2008, Data Knowl. Eng..

[4]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[5]  Ling Zhuang,et al.  A maximal frequent itemset approach for Web document clustering , 2004, The Fourth International Conference onComputer and Information Technology, 2004. CIT '04..

[6]  Hui Xiong,et al.  HICAP: Hierarchical Clustering with Pattern Preservation , 2004, SDM.

[7]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[8]  Marzena Kryszkiewicz,et al.  Hierarchical Document Clustering Using Frequent Closed Sets , 2006, Intelligent Information Systems.

[9]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  Hongjun Lu,et al.  Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree , 2004, Data Mining and Knowledge Discovery.

[12]  Bingru Yang,et al.  Index-Maxminer: a New Maximal Frequent Itemset Mining Algorithm , 2008, Int. J. Artif. Intell. Tools.

[13]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.