Finding Frequent Patterns Using Length-Decreasing Support Constraints

Finding prevalent patterns in large amount of data has been one of the major problems in the area of data mining. Particularly, the problem of finding frequent itemset or sequential patterns in very large databases has been studied extensively over the years, and a variety of algorithms have been developed for each problem. The key feature in most of these algorithms is that they use a constant support constraint to control the inherently exponential complexity of these two problems. In general, patterns that contain only a few items will tend to be interesting if they have a high support, whereas long patterns can still be interesting even if their support is relatively small. Ideally, we want to find all the frequent patterns whose support decreases as a function of their length without having to find many uninteresting infrequent short patterns. Developing such algorithms is particularly challenging because the downward closure property of the constant support constraint cannot be used to prune short infrequent patterns.In this paper we present two algorithms, LPMiner and SLPMiner. Given a length-decreasing support constraint, LPMiner finds all the frequent itemset patterns from an itemset database, and SLPMiner finds all the frequent sequential patterns from a sequential database. Each of these two algorithms combines a well-studied efficient algorithm for constant-support-based pattern discovery with three effective database pruning methods that dramatically reduce the runtime. Our experimental evaluations show that both LPMiner and SLPMiner, by effectively exploiting the length-decreasing support constraint, are up to two orders of magnitude faster, and their runtime increases gradually as the average length of the input patterns increases.

[1]  Zvi M. Kedem,et al.  Pincer-Search: An Efficient Algorithm for Discovering the Maximum Frequent Set , 2002, IEEE Trans. Knowl. Data Eng..

[2]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[3]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[4]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[5]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[6]  PeiJian,et al.  Mining Frequent Patterns without Candidate Generation , 2000 .

[7]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[8]  HuYa-Han,et al.  Mining association rules with multiple minimum supports , 2006 .

[9]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[10]  Jian Pei,et al.  Can we push more constraints into frequent pattern mining? , 2000, KDD '00.

[11]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[12]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[13]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[14]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[15]  ZhengZijian,et al.  KDD-Cup 2000 organizers' report , 2000 .

[16]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[17]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  Ke Wang,et al.  Mining Frequent Itemsets Using Support Constraints , 2000, VLDB.

[19]  Valerie Guralnik,et al.  Parallel Tree Projection Algorithm for Sequence Mining , 2001, Euro-Par.

[20]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[21]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[22]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[23]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[24]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[25]  George Karypis,et al.  PAFI: A Pattern Finding Toolkit , 2003 .

[26]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[27]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[28]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[29]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[30]  D. Gusfield Efficient methods for multiple sequence alignment with guaranteed error bounds , 1993 .

[31]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[32]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[33]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .

[34]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[35]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[36]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[37]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[38]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.