Analyzing Efficient Algorithms of Frequent Pattern Mining

Frequent pattern mining has been playing an important role for analyzing data in various fields such as medical treatment, biology, finance, networks, and so on. Since Apriori algorithm was proposed, frequent pattern mining has rapidly developed due to active research activities, and numerous mining algorithms have been proposed, such as FP-growth, FP-growth*, LCM, AFORT, and MAFIA. In this paper, we analyze and compare a variety of frequent pattern mining approaches, and discuss advantages and disadvantages of their algorithms. For the comparison, we evaluate mining performance for each algorithm using real datasets. In addition, we also experiment scalability for the algorithms to analyze their characteristics exactly. In the experimental results, we can know that LCM guarantees the fastest runtime performance, and FP-growth* and AFOPT show the most efficient memory usage. Using the characteristics analyzed from this paper, we can select and utilize the most appropriate algorithm with respect to numerous databases in the real world.

[1]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[2]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  J. Yu,et al.  Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree , 2004, Data Mining and Knowledge Discovery.

[4]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[5]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[6]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[7]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.