Mining top-k regular-frequent itemsets using database partitioning and support estimation

Temporal regularity of itemset appearance can be regarded as an important criterion for measuring the interestingness of itemsets in several applications. A frequent itemset can be said to be regular-frequent in a database if it appears at a regular period. Therefore, the problem of mining a complete set of regular-frequent itemsets requires the specification of a support and a regularity threshold. However, in practice, it is often difficult for users to provide an appropriate support threshold. In addition, the use of a support threshold tends to produce a large number of regular-frequent itemsets and it might be better to ask for the number of desired results. We thus propose an efficient algorithm for mining top-k regular-frequent itemsets without setting a support threshold. Based on database partitioning and support estimation techniques, the proposed algorithm also uses a best-first search strategy with only one database scan. We then compare our algorithm with the state-of-the-art algorithms for mining top-k regular-frequent itemsets. Our experimental studies on both synthetic and real data show that our proposal achieves high performance for small and large values of k.

[1]  Hua-Fu Li Mining top-k maximal reference sequences from streaming web click-sequences with a damped sliding window , 2009, Expert Syst. Appl..

[2]  Philippe Lenca,et al.  Mining Interesting Rules Without Support Requirement: A General Universal Existential Upward Closure Property , 2010, Data Mining.

[3]  Young-Koo Lee,et al.  Discovering Periodic-Frequent Patterns in Transactional Databases , 2009, PAKDD.

[4]  Pauray S. M. Tsai,et al.  Mining top-k frequent closed itemsets over data streams using the sliding window model , 2010, Expert Syst. Appl..

[5]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6]  Ada Wai-Chee Fu,et al.  Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Kotagiri Ramamohanarao,et al.  Efficient Mining of High Confidience Association Rules without Support Thresholds , 1999, PKDD.

[8]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[9]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[10]  Houkuan Huang,et al.  TOPSIS: Finding Top-K significant N-itemsets in sliding windows adaptively , 2008, Knowl. Based Syst..

[11]  Ke Wang,et al.  Mining confident rules without support requirement , 2001, CIKM '01.

[12]  Philippe Lenca,et al.  Mining Top-K Periodic-Frequent Pattern from Transactional Databases without Support Threshold , 2009, IAIT.

[13]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[14]  Jian Tang,et al.  Mining N-most Interesting Itemsets , 2000, ISMIS.

[15]  Hua-Fu Li,et al.  Interactive mining of top-K frequent closed itemsets from data streams , 2009, Expert Syst. Appl..

[16]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[19]  Yun Sing Koh Mining Non-coincidental Rules without a User Defined Support Threshold , 2008, PAKDD.

[20]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[21]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[22]  Andrea Pietracaprina,et al.  Efficient Incremental Mining of Top-K Frequent Closed Itemsets , 2007, Discovery Science.

[23]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[24]  Bart Goethals,et al.  Frequent Set Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[25]  Ramkishore Bhattacharyya,et al.  High Confidence Association Mining Without Support Pruning , 2007, PReMI.

[26]  Patrick Meyer,et al.  On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid , 2008, Eur. J. Oper. Res..