Mining Top-K Co-Occurrence Items

Frequent itemset mining has emerged as a fundamental problem in data mining and plays an important role in many data mining tasks, such as association analysis, classification, etc. In the framework of frequent itemset mining, the results are itemsets that are frequent in the whole database. However, in some applications, such recommendation systems and social networks, people are more interested in finding out the items that occur with some user-specified itemsets (query itemsets) most frequently in a database. In this paper, we address the problem by proposing a new mining task named top-k co-occurrence item mining, where k is the desired number of items to be found. Four baseline algorithms are presented first. Then, we introduce a special data structure named Pi-Tree (Prefix itemset Tree) to maintain the information of itemsets. Based on Pi-Tree, we propose two algorithms, namely PT (Pi-Tree-based algorithm) and PT-TA (Pi-Tree-based algorithm with TA pruning), for mining top-k co-occurrence items by incorporating several novel strategies for pruning the search space to achieve high efficiency. The performance of PT and PT-TA was evaluated against the four proposed baseline algorithms on both synthetic and real databases. Extensive experiments show that PT not only outperforms other algorithms substantially in terms execution time but also has excellent scalability.

[1]  Zhi-Hong Deng,et al.  Fast mining frequent itemsets using Nodesets , 2014, Expert Syst. Appl..

[2]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[3]  Zhi-Hong Deng,et al.  PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning , 2015, Expert Syst. Appl..

[4]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Zhi-Hong Deng,et al.  Fast mining Top-Rank-k frequent patterns by using Node-lists , 2014, Expert Syst. Appl..

[6]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Ada Wai-Chee Fu,et al.  Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Ming-Syan Chen,et al.  Mining top-k frequent patterns in the presence of the memory constraint , 2008, The VLDB Journal.

[9]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[10]  Reda Alhajj,et al.  A Bounded and Adaptive Memory-Based Approach to Mine Frequent Patterns From Very Large Databases , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[12]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[13]  Zhonghui Wang,et al.  A new algorithm for fast mining frequent itemsets using N-lists , 2012, Science China Information Sciences.

[14]  Tzung-Pei Hong,et al.  A Hybrid Approach for Mining Frequent Itemsets , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[15]  Tzung-Pei Hong,et al.  Incrementally fast updated frequent pattern trees , 2008, Expert Syst. Appl..

[16]  Bay Vo,et al.  An efficient and effective algorithm for mining top-rank-k frequent patterns , 2015, Expert Syst. Appl..

[17]  Aijun An,et al.  Mining top-k high utility patterns over data streams , 2014, Inf. Sci..

[18]  J. Yu,et al.  Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree , 2004, Data Mining and Knowledge Discovery.

[19]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[20]  Qiang Ding,et al.  PARM—An Efficient Algorithm to Mine Association Rules From Spatial Data , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  I-En Liao,et al.  A frequent itemset mining algorithm based on the Principle of Inclusion-Exclusion and transaction mapping , 2014, Inf. Sci..

[22]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[23]  Sen Zhang,et al.  New Techniques for Mining Frequent Patterns in Unordered Trees , 2015, IEEE Transactions on Cybernetics.

[24]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[25]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[26]  Enrique Herrera-Viedma,et al.  A quality based recommender system to disseminate information in a university digital library , 2014, Inf. Sci..

[27]  Zhihong Deng,et al.  A New Fast Vertical Method for Mining Frequent Patterns , 2010 .

[28]  Philip S. Yu,et al.  Mining top-K high utility itemsets , 2012, KDD.

[29]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[30]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[31]  Srinivasan Parthasarathy,et al.  Parallel and distributed methods for incremental frequent itemset mining , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[33]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[34]  Bay Vo,et al.  An N-list-based algorithm for mining frequent closed patterns , 2015, Expert Syst. Appl..

[35]  Tzung-Pei Hong,et al.  Mining frequent itemsets using the N-list and subsume concepts , 2014, Int. J. Mach. Learn. Cybern..

[36]  Jifu Zhang,et al.  FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce , 2016, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[37]  Elena Baralis,et al.  Scalable out-of-core itemset mining , 2015, Inf. Sci..

[38]  Jing-Rung Yu,et al.  FIUT: A new method for mining frequent itemsets , 2009, Inf. Sci..

[39]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[40]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.