Mining closed+ high utility itemsets without candidate generation

High utility itemsets (HUIs) mining refers to discovering sets of items that not only co-occur but also carry high utilities (e.g., high profits). HUI mining receives extensive attentions in recent years due to the wide applications in various domains like commerce and biomedicine. However, huge number of HUIs might be produced to users, which degrades the efficiency of the mining process. A promising solution to this problem is to mine closed+ high utility itemset (CHUI), a compact and lossless representation of HUIs. Nevertheless, existing algorithms incur the problem of producing a large amount of candidates, which degrades the mining performance in terms of time and space. In this paper, a novel algorithm named CHUI-Miner (Closed+ High Utility Itemset mining without candidates) for mining CHUIs is proposed, which directly computes the utility of itemsets without producing candidates. To our best knowledge, this is the first work addressing the issue of mining CHUIs without candidate generation. Experimental results show that CHUI-Miner is several orders of magnitude faster than the state-of-the-art algorithms.

[1]  Philip S. Yu,et al.  Mining interesting user behavior patterns in mobile commerce environments , 2012, Applied Intelligence.

[2]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[3]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[5]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[7]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[8]  Philip S. Yu,et al.  Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Philip S. Yu,et al.  Efficient Mining of a Concise and Lossless Representation of High Utility Itemsets , 2011, 2011 IEEE 11th International Conference on Data Mining.

[10]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Howard J. Hamilton,et al.  A Unified Framework for Utility Based Measures for Mining Itemsets , 2006 .

[12]  Vincent S. Tseng,et al.  UP-Miner: A Utility Pattern Mining Toolbox , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[13]  Philip S. Yu,et al.  Mining top-K high utility itemsets , 2012, KDD.

[14]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[15]  Philip S. Yu,et al.  Online mining of temporal maximal utility itemsets from data streams , 2010, SAC '10.

[16]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[17]  Jian Pei,et al.  H-Mine: Fast and space-preserving frequent pattern mining in large databases , 2007 .

[18]  Gerd Stumme,et al.  Generating a Condensed Representation for Association Rules , 2005, Journal of Intelligent Information Systems.

[19]  Chin-Chen Chang,et al.  Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[20]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[21]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[23]  A. Choudhary,et al.  A fast high utility itemsets mining algorithm , 2005, UBDM '05.

[24]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[25]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[26]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[27]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[28]  Philip S. Yu,et al.  Efficient Algorithms for Mining Top-K High Utility Itemsets , 2016, IEEE Transactions on Knowledge and Data Engineering.

[29]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[30]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.