Mining Top‐Rank‐k Erasable Itemsets by PID_lists

Mining erasable itemsets are one of new emerging data mining tasks. In this paper, we present a new data representation called a PID_list, which keeps track of the id_nums (identification number) of products that include an itemset. On the basis of the PID_list, we propose a new algorithm called VM for mining top‐rank‐k erasable itemsets efficiently. The VM algorithm can avoid the time‐consuming process of calculating the gain of the candidate itemsets and lots of scans of the databases. Therefore, it can accelerate the task of mining greatly. For evaluating the VM algorithm, we have conducted experiments on six synthetic product databases. Our performance study shows that the VM algorithm is efficient and much faster than the MIKE algorithm, which is the first algorithm for dealing with the problem of mining top‐rank‐k erasable itemsets.

[1]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[2]  Keun Ho Ryu,et al.  Approximate weighted frequent pattern mining with/without noisy environments , 2011, Knowl. Based Syst..

[3]  Jianying Hu,et al.  High-utility pattern mining: A method for discovery of high-utility item sets , 2007, Pattern Recognit..

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[6]  Bay Vo,et al.  A New Approach for Mining Top-Rank-k Erasable Itemsets , 2014, ACIIDS.

[7]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[10]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[11]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[12]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[13]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[14]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Xijin Tang,et al.  Text clustering using frequent itemsets , 2010, Knowl. Based Syst..

[16]  Ada Wai-Chee Fu,et al.  Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[18]  Vivekanand Gopalkrishnan,et al.  Towards efficient mining of proportional fault-tolerant frequent itemsets , 2009, KDD.

[19]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[20]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[21]  Zhi-Hong Deng,et al.  Mining erasable itemsets , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[22]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[23]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.