Batch incremental processing for FP-tree construction using FP-Growth algorithm

In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.

[1]  M. Tahar Kechadi,et al.  Performance study of distributed Apriori-like frequent itemsets mining , 2010, Knowledge and Information Systems.

[2]  R. S. Thakur,et al.  Frequent Pattern Mining Using Modified CP-Tree for Knowledge Discovery , 2010, ADMA.

[3]  Soon Myoung Chung,et al.  Efficient mining of maximal frequent itemsets from databases on a cluster of workstations , 2004, Knowledge and Information Systems.

[4]  Sebastián Ventura,et al.  Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules , 2011, Knowledge and Information Systems.

[5]  Young-Koo Lee,et al.  Efficient single-pass frequent pattern mining using a prefix-tree , 2009, Inf. Sci..

[6]  Osmar R. Zaïane,et al.  Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[7]  Oscar Cordón,et al.  MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining , 2011, Knowledge and Information Systems.

[8]  Jianzhong Li,et al.  Mining frequent subgraphs over uncertain graph databases under probabilistic semantics , 2012, The VLDB Journal.

[9]  Wilfred Ng,et al.  A survey on algorithms for mining frequent itemsets over data streams , 2008, Knowledge and Information Systems.

[10]  Tzung-Pei Hong,et al.  Using the Structure of Prelarge Trees to Incrementally Mine Frequent Itemsets , 2010, New Generation Computing.

[11]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[12]  Zhan Li,et al.  Knowledge and Information Systems , 2007 .

[13]  P. V. G. D. Prasad Reddy,et al.  Batch Processing for Incremental FP-tree Construction , 2010 .

[14]  Gillian Dobbie,et al.  SPO-Tree: Efficient Single Pass Ordered Incremental Pattern Mining , 2011, DaWaK.

[15]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[16]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[17]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[18]  Nicolás García-Pedrajas,et al.  Scaling up data mining algorithms: review and taxonomy , 2012, Progress in Artificial Intelligence.

[19]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[20]  Francesco Bonchi,et al.  On closed constrained frequent pattern mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[21]  Ming-Syan Chen,et al.  Sliding window filtering: an efficient method for incremental mining on a time-variant database , 2005, Inf. Syst..

[22]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[23]  Carson Kai-Sang Leung,et al.  CanTree: a tree structure for efficient incremental mining of frequent patterns , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.