A Guided FP-growth algorithm for fast mining of frequent itemsets from big data

In this paper we present the GFP-growth (Guided FP-growth) algorithm, a novel method for finding the count of a given list of itemsets in large data. Unlike FP-growth, our algorithm is designed to focus on the specific multiple itemsets of interest and hence its time and memory costs are better. We prove that the GFP-growth algorithm yields the exact frequency-counts for the required itemsets. We show that for a number of different problems, a solution can be devised which takes advantage of the efficient implementation of multi-targeted mining for boosting the performance. In particular, we study in detail the problem of generating the minority-class rules from imbalanced data, a scenario that appears in many real-life domains such as medical applications, failure prediction, network and cyber security, and maintenance. We develop the Minority-Report Algorithm that uses the GFP-growth for boosting performance. We prove some theoretical properties of the Minority-Report Algorithm and demonstrate its superior performance using simulations and real data.

[1]  Tzung-Pei Hong,et al.  A Fast Updated Frequent Pattern Tree , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[2]  P. V. G. D. Prasad Reddy,et al.  Batch incremental processing for FP-tree construction using FP-Growth algorithm , 2012, Knowledge and Information Systems.

[3]  Vijay V. Raghavan,et al.  Itemset Trees for Targeted Association Querying , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Shiwei Tang,et al.  Efficient incremental maintenance of frequent patterns with FP-tree , 2008, Journal of Computer Science and Technology.

[5]  Philippe Fournier-Viger,et al.  MEIT: Memory Efficient Itemset Tree for Targeted Association Rule Mining , 2013, ADMA.

[6]  Hong Shen,et al.  Mining the optimal class association rule set , 2002, Knowl. Based Syst..

[7]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[8]  Christie I. Ezeife,et al.  Mining Incremental Association Rules with Generalized FP-Tree , 2002, Canadian Conference on AI.

[9]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[10]  Alaaeldin M. Hafez,et al.  Mining Frequent Itemsets Using Re-Usable Data Structure , 2007, DMIN.

[11]  Frans Coenen,et al.  Obtaining best parameter values for accurate classification , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Simplice Dossou-Gbété,et al.  Classification approach based on association rules mining for unbalanced data , 2012, ArXiv.

[13]  Don-Lin Yang,et al.  ADMiner: An Incremental Data Mining Approach Using a Compressed FP-tree , 2013, J. Softw..

[14]  Osmar R. Zaïane,et al.  Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[15]  Tzung-Pei Hong,et al.  The Pre-FUFP algorithm for incremental mining , 2009, Expert Syst. Appl..

[16]  Defu Zhang,et al.  A New Algorithm for Frequent Itemsets Mining Based on Apriori and FP-Tree , 2009, 2009 WRI Global Congress on Intelligent Systems.

[17]  Heaton Jeff Comparing dataset characteristics that favor the Apriori, Eclat or FP-Growth frequent itemset mining algorithms , 2016 .

[18]  Christian Borgelt,et al.  Frequent item set mining , 2012, WIREs Data Mining Knowl. Discov..

[19]  Bart Goethals,et al.  Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations , 2005, KDD 2005.

[20]  Tzung-Pei Hong,et al.  Using the Structure of Prelarge Trees to Incrementally Mine Frequent Itemsets , 2010, New Generation Computing.

[21]  Hongxing He,et al.  Association Rule Discovery with Unbalanced Class Distributions , 2003, Australian Conference on Artificial Intelligence.

[22]  Ke Wang,et al.  Top Down FP-Growth for Association Rule Mining , 2002, PAKDD.

[23]  Sahar M. Ghanem,et al.  Towards robust classifiers using optimal rule discovery , 2014, Int. J. Data Min. Model. Manag..

[24]  Wei Wei,et al.  An effective algorithm for simultaneously mining frequent patterns and association rules , 2008, 2008 IEEE International Conference on Service Operations and Logistics, and Informatics.

[25]  S. Jyothi,et al.  Tree-based incremental association rule mining without candidate itemset generation , 2010, Trendz in Information Sciences & Computing(TISC2010).

[26]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[27]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[28]  Sahar M. Ghanem,et al.  EDP-ORD: Efficient distributed/parallel Optimal Rule Discovery , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[29]  Sanjay Chawla,et al.  CCCS: a top-down associative classifier for imbalanced class distribution , 2006, KDD '06.

[30]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).