Fast mining erasable itemsets using NC_sets

Mining erasable itemsets first introduced in 2009 is one of new emerging data mining tasks. In this paper, we present a new data representation called NC_set, which keeps track of the complete information used for mining erasable itemsets. Based on NC_set, we propose a new algorithm called MERIT for mining erasable itemsets efficiently. The efficiency of MERIT is achieved with three techniques as follows. First, the NC_set is a compact structure, which prunes irrelevant data automatically. Second, the computation of the gain of an itemset is transformed into the combination of NC_sets, which can be completed in linear time complexity by an ingenious strategy. Third, MERIT can directly find erasable itemsets without generating candidate itemsets in some cases. For evaluating MERIT, we have conducted extensive experiments on a lot of synthetic product databases. Our performance study shows that the MERIT is efficient and is on average about two orders of magnitude faster than the META, the first algorithm for mining erasable itemsets.

[1]  Xin Li,et al.  Mining frequent patterns from network flows for monitoring network , 2010, Expert Syst. Appl..

[2]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Zhi-Hong Deng,et al.  Mining erasable itemsets , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[4]  Zhonghui Wang,et al.  A New Fast Vertical Method for Mining Frequent Patterns , 2010, Int. J. Comput. Intell. Syst..

[5]  Vivekanand Gopalkrishnan,et al.  CP-summary: a concise representation for browsing frequent itemsets , 2009, KDD.

[6]  Jianying Hu,et al.  High-utility pattern mining: A method for discovery of high-utility item sets , 2007, Pattern Recognit..

[7]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[8]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[9]  Philip S. Yu,et al.  Direct mining of discriminative and essential frequent patterns via model-based search tree , 2008, KDD.

[10]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[12]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[13]  Vivekanand Gopalkrishnan,et al.  Towards efficient mining of proportional fault-tolerant frequent itemsets , 2009, KDD.

[14]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[15]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[16]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[17]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[18]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[19]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[20]  Vipin Kumar,et al.  Quantitative evaluation of approximate frequent pattern mining algorithms , 2008, KDD.

[21]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[22]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[23]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[24]  Tzung-Pei Hong,et al.  An improved data mining approach using predictive itemsets , 2009, Expert Syst. Appl..

[25]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[26]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[27]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[28]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[29]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[30]  Feifei Li,et al.  Finding frequent items in probabilistic data , 2008, SIGMOD Conference.

[31]  T. Yalcinoz,et al.  Implementing soft computing techniques to solve economic dispatch problem in power systems , 2008, Expert Syst. Appl..

[32]  Yang Xiang,et al.  Cartesian contour: a concise representation for a collection of frequent sets , 2009, KDD.