PADS: a simple yet effective pattern-aware dynamic search method for fast maximal frequent pattern mining

While frequent pattern mining is fundamental for many data mining tasks, mining maximal frequent patterns efficiently is important in both theory and applications of frequent pattern mining. The fundamental challenge is how to search a large space of item combinations. Most of the existing methods search an enumeration tree of item combinations in a depth-first manner. In this paper, we develop a new technique for more efficient max-pattern mining. Our method is pattern-aware: it uses the patterns already found to schedule its future search so that many search subspaces can be pruned. We present efficient techniques to implement the new approach. As indicated by a systematic empirical study using the benchmark data sets, our new approach outperforms the currently fastest max-pattern mining algorithms FPMax* and LCM2 clearly. The source code and the executable code (on both Windows and Linux platforms) are publicly available at http://www.cs.sfu.ca/~jpei/Software/PADS.zip.

[1]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[2]  Guizhen Yang,et al.  The complexity of mining maximal frequent itemsets and maximal frequent patterns , 2004, KDD.

[3]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[4]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  Soon Myoung Chung,et al.  A scalable algorithm for mining maximal frequent sequences using a sample , 2008, Knowledge and Information Systems.

[6]  Charu C. Aggarwal,et al.  Towards long pattern generation in dense databases , 2001, SKDD.

[7]  Peter Scheuermann,et al.  Distributed Web Log Mining Using Maximal Large Itemsets , 2001, Knowledge and Information Systems.

[8]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[9]  Norberto F. Ezquerra,et al.  Constraining and summarizing association rules in medical data , 2006, Knowledge and Information Systems.

[10]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[12]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Association Rule Mining , 2007 .

[13]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[15]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[16]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[17]  Kotagiri Ramamohanarao,et al.  DeEPs: A New Instance-Based Lazy Discovery and Classification System , 2004, Machine Learning.

[18]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[19]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[20]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[21]  Jinyan Li,et al.  Mining border descriptions of emerging patterns from dataset pairs , 2005, Knowledge and Information Systems.

[22]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[23]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[24]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[25]  Jian Pei,et al.  Mining Condensed Frequent-Pattern Bases , 2003, Knowledge and Information Systems.

[26]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[27]  Francesco Bonchi,et al.  On condensed representations of constrained frequent patterns , 2005, Knowledge and Information Systems.

[28]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[29]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[30]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[31]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[32]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[33]  Ronen Feldman,et al.  TEG—a hybrid approach to information extraction , 2005, Knowledge and Information Systems.

[34]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[35]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[36]  Jinyan Li,et al.  A new concise representation of frequent itemsets using generators and a positive border , 2008, Knowledge and Information Systems.