Mining high occupancy itemsets

Abstract Frequent itemset mining has been extensively studied in data mining for over the last two decades because of its numerous applications. However, the classic support-based mining framework used by most previous studies is not suitable for some real-world applications, such as the travel landscapes recommendation, where o c c u p a n c y besides s u p p o r t also plays a key role in evaluating the interestingness of an itemset. In this paper, we propose a new kind of tasks based on o c c u p a n c y , namely high occupancy mining, by introducing o c c u p a n c y into the support-based mining framework. An efficient algorithm, HEP (abbreviation for High Efficient algorithm for mining high occupancy itemsets), is developed to discover all high occupancy itemsets. HEP use a structure, named occupancy-list, to store the occupancy information about an itemset and employs an iterative level-wise approach to mine high occupancy itemset via a pruning strategy based on upper bound of occupancy. Substantial experiments on both synthetic and real datasets show that HEP is efficient for mining high occupancy itemsets and is at least one order of magnitude faster than the baseline algorithm.

[1]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[5]  Hui Xiong,et al.  Occupancy-Based Frequent Pattern Mining* , 2015, ACM Trans. Knowl. Discov. Data.

[6]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[7]  Jeffrey F. Naughton,et al.  On differentially private frequent itemset mining , 2012, Proc. VLDB Endow..

[8]  Ping Luo,et al.  Incorporating occupancy into frequent pattern mining for high quality pattern recommendation , 2012, CIKM.

[9]  Bay Vo,et al.  An N-list-based algorithm for mining frequent closed patterns , 2015, Expert Syst. Appl..

[10]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[11]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[12]  Zhi-Hong Deng,et al.  Fast mining frequent itemsets using Nodesets , 2014, Expert Syst. Appl..

[13]  Zhonghui Wang,et al.  A new algorithm for fast mining frequent itemsets using N-lists , 2012, Science China Information Sciences.

[14]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[15]  Philippe Fournier-Viger,et al.  Exploiting High Utility Occupancy Patterns , 2017, APWeb/WAIM.

[16]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[17]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[18]  John F. Roddick,et al.  Sequential pattern mining -- approaches and algorithms , 2013, CSUR.

[19]  Anton Dries,et al.  Dominance Programming for Itemset Mining , 2013, 2013 IEEE 13th International Conference on Data Mining.

[20]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Zhi-Hong Deng,et al.  PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning , 2015, Expert Syst. Appl..

[22]  Philip S. Yu,et al.  On dense pattern mining in graph streams , 2010, Proc. VLDB Endow..

[23]  Zhi-Hong Deng,et al.  DiffNodesets: An efficient structure for fast mining frequent itemsets , 2015, Appl. Soft Comput..