Incorporating occupancy into frequent pattern mining for high quality pattern recommendation

Mining interesting patterns from transaction databases has attracted a lot of research interest for more than a decade. Most of those studies use frequency, the number of times a pattern appears in a transaction database, as the key measure for pattern interestingness. In this paper, we introduce a new measure of pattern interestingness, occupancy. The measure of occupancy is motivated by some real-world pattern recommendation applications which require that any interesting pattern X should occupy a large portion of the transactions it appears in. Namely, for any supporting transaction t of pattern X, the number of items in X should be close to the total number of items in t. In these pattern recommendation applications, patterns with higher occupancy may lead to higher recall while patterns with higher frequency lead to higher precision. With the definition of occupancy we call a pattern dominant if its occupancy is above a user-specified threshold. Then, our task is to identify the qualified patterns which are both frequent and dominant. Additionally, we also formulate the problem of mining top-k qualified patterns: finding the qualified patterns with the top-k values of any function (e.g. weighted sum of both occupancy and support). The challenge to these tasks is that the monotone or anti-monotone property does not hold on occupancy. In other words, the value of occupancy does not increase or decrease monotonically when we add more items to a given itemset. Thus, we propose an algorithm called DOFIA (DOminant and Frequent Itemset mining Algorithm), which explores the upper bound properties on occupancy to reduce the search process. The tradeoff between bound tightness and computational complexity is also systematically addressed. Finally, we show the effectiveness of DOFIA in a real-world application on print-area recommendation for Web pages, and also demonstrate the efficiency of DOFIA on several large synthetic data sets.

[1]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[2]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[3]  Mohammed J. Zaki,et al.  GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets , 2005, Data Mining and Knowledge Discovery.

[4]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[5]  Francesco Bonchi,et al.  On closed constrained frequent pattern mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[7]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Bruno Crémilleux,et al.  Mining constraint-based patterns using automatic relaxation , 2009, Intell. Data Anal..

[9]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[10]  Dino Pedreschi,et al.  ExAnte: Anticipated Data Reduction in Constrained Pattern Mining , 2003, PKDD.

[11]  Jianyong Wang,et al.  Efficient closed pattern mining in the presence of tough block constraints , 2004, KDD.

[12]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jean-François Boulicaut,et al.  Closed patterns meet n-ary relations , 2009, TKDD.

[14]  Srinivasan Parthasarathy,et al.  Learning Approximate MRFs from Large Transactional Data , 2006, SNA@ICML.

[15]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[16]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[17]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[18]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[19]  Srinivasan Parthasarathy,et al.  Learning Approximate MRFs from Large Transaction Data , 2006, PKDD.