A frequent itemset mining algorithm based on the Principle of Inclusion-Exclusion and transaction mapping

This paper proposes a novel frequent itemsets mining algorithm called PIETM (Principle of Inclusion-Exclusion and Transaction Mapping). PIETM has three major features. First, similar to the mining process in Apriori, PIETM discovers frequent itemsets in a bottom-up manner. However, it reduces database scanning to only two times. Second, PIETM does not scan the database to count the itemsets' support. Instead, it employs the Principle of Inclusion-Exclusion to calculate the support of candidate itemsets. Third, PIETM uses transaction intervals to map and store the transaction ids of each item, which facilitates the itemsets counting process. We also present experimental results comparing PIETM with existing algorithms in this paper. The results show that PIETM takes lower execution time than other methods when the dataset has numerous items. In summary, this paper makes three major contributions. First, it presents a new method that calculates the support of itemsets using a well-known set theory property. Second, it demonstrates the correctness of counting itemsets in PIETM. Third, our method demonstrates its suitability for different high-performance applications by combining the advantages of the simplicity of Apriori and efficiency of FP-growth.

[1]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[2]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[3]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[4]  P. R. Ouyang,et al.  A novel hybridization design principle for intelligent mechatronics systems , 2010 .

[5]  Balázs Rácz,et al.  nonordfp: An FP-growth variation without rebuilding the FP-tree , 2004, FIMI.

[6]  Jing-Rung Yu,et al.  FIUT: A new method for mining frequent itemsets , 2009, Inf. Sci..

[7]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[8]  C. L. Liu,et al.  Introduction to Combinatorial Mathematics. , 1971 .

[9]  Kuen-Fang Jea,et al.  An efficient and flexible algorithm for online mining of large itemsets , 2004, Inf. Process. Lett..

[10]  Sanguthevar Rajasekaran,et al.  A transaction mapping algorithm for frequent itemsets mining , 2006 .

[11]  Luca Cagliero,et al.  Generalized association rule mining with constraints , 2012, Inf. Sci..

[12]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[14]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[15]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[16]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  Viara Popova,et al.  Complexity Analysis of Depth First and FP-Growth Implementations of APRIORI , 2003, MLDM.

[18]  Gillian Dobbie,et al.  Weighted association rule mining via a graph based connectivity model , 2013, Inf. Sci..

[19]  Zvi M. Kedem,et al.  Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set , 2002, IEEE Trans. Knowl. Data Eng..

[20]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.