Mining frequent itemsets using the N-list and subsume concepts

Frequent itemset mining is a fundamental element with respect to many data mining problems directed at finding interesting patterns in data. Recently the PrePost algorithm, a new algorithm for mining frequent itemsets based on the idea of N-lists, which in most cases outperforms other current state-of-the-art algorithms, has been presented. This paper proposes an improved version of PrePost, the N-list and Subsume-based algorithm for mining Frequent Itemsets (NSFI) algorithm that uses a hash table to enhance the process of creating the N-lists associated with 1-itemsets and an improved N-list intersection algorithm. Furthermore, two new theorems are proposed for determining the “subsume index” of frequent 1-itemsets based on the N-list concept. Using the subsume index, NSFI can identify groups of frequent itemsets without determining the N-list associated with them. The experimental results show that NSFI outperforms PrePost in terms of runtime and memory usage and outperforms dEclat in terms of runtime.

[1]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[2]  CoenenFrans,et al.  The effect of threshold values on association rule based classification accuracy , 2007 .

[3]  Zhi-Hong Deng,et al.  Mining erasable itemsets , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[4]  Mohammed J. Zaki,et al.  Prism: A Primal-Encoding Approach for Frequent Sequence Mining , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Frans Coenen,et al.  An Efficient Algorithm for Mining Erasable Itemsets Using the Difference of NC-Sets , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[6]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[7]  Marghny H. Mohamed,et al.  Efficient mining frequent itemsets algorithms , 2014, Int. J. Mach. Learn. Cybern..

[8]  Chien-Sing Lee,et al.  Processing online analytics with classification and association rule mining , 2010, Knowl. Based Syst..

[9]  Xizhao Wang,et al.  Building a Rule-Based Classifier—A Fuzzy-Rough Set Approach , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Frans Coenen,et al.  The effect of threshold values on association rule based classification accuracy , 2007, Data Knowl. Eng..

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[13]  Mohammed J. Zaki,et al.  Calibrated Lazy Associative Classification , 2008, SBBD.

[14]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Jie Dong,et al.  BitTableFI: An efficient mining frequent itemsets algorithm , 2007, Knowl. Based Syst..

[17]  Engelbert Mephu Nguifo,et al.  CMRULES: An Efficient Algorithm for Mining Sequential Rules Common to Several Sequences , 2010, FLAIRS.

[18]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[19]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[20]  Zhihong Deng,et al.  A New Fast Vertical Method for Mining Frequent Patterns , 2010 .

[21]  Mohammed J. Zaki,et al.  Prism: An effective approach for frequent sequence mining via prime-block encoding , 2010, J. Comput. Syst. Sci..

[22]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[23]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[24]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[25]  Tzung-Pei Hong,et al.  Classification based on association rules: A lattice-based approach , 2012, Expert Syst. Appl..

[26]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[27]  Engelbert Mephu Nguifo,et al.  CMRules: Mining sequential rules common to several sequences , 2012, Knowl. Based Syst..

[28]  Xing Zhang,et al.  Building a highly-compact and accurate associative classifier , 2011, Applied Intelligence.

[29]  Bay Vo,et al.  MEI: An efficient algorithm for mining erasable itemsets , 2014, Eng. Appl. Artif. Intell..

[30]  Elena Baralis,et al.  Constrained itemset mining on a sequence of incoming data blocks , 2010 .

[31]  Frans Coenen,et al.  A new method for mining Frequent Weighted Itemsets based on WIT-trees , 2013, Expert Syst. Appl..

[32]  Li Liu,et al.  Chinese Question Classification Based on Question Property Kernel , 2014, Int. J. Mach. Learn. Cybern..

[33]  Bay Vo,et al.  Interestingness measures for association rules: Combination between lattice and hash tables , 2011, Expert Syst. Appl..

[34]  Bingru Yang,et al.  Index-BitTableFI: An improved algorithm for mining frequent itemsets , 2008, Knowl. Based Syst..

[35]  Tzung-Pei Hong,et al.  MSGPs: A Novel Algorithm for Mining Sequential Generator Patterns , 2012, ICCCI.

[36]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[37]  Zhonghui Wang,et al.  A new algorithm for fast mining frequent itemsets using N-lists , 2012, Science China Information Sciences.

[38]  Tzung-Pei Hong,et al.  A Hybrid Approach for Mining Frequent Itemsets , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[39]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[40]  Tzung-Pei Hong,et al.  A lattice-based approach for mining most generalization association rules , 2013, Knowl. Based Syst..