Computing the minimum-support for mining frequent patterns

Frequent pattern mining is based on the assumption that users can specify the minimum-support for mining their databases. It has been recognized that setting the minimum-support is a difficult task to users. This can hinder the widespread applications of these algorithms. In this paper we propose a computational strategy for identifying frequent itemsets, consisting of polynomial approximation and fuzzy estimation. More specifically, our algorithms (polynomial approximation and fuzzy estimation) automatically generate actual minimum-supports (appropriate to a database to be mined) according to users’ mining requirements. We experimentally examine the algorithms using different datasets, and demonstrate that our fuzzy estimation algorithm fittingly approximates actual minimum-supports from the commonly-used requirements.

[1]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[4]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[5]  Edith Cohen,et al.  Finding Interesting Associations without Support Pruning , 2001, IEEE Trans. Knowl. Data Eng..

[6]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.

[7]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  F RoddickJohn,et al.  What's interesting about Cricket? , 2001 .

[10]  Hongjun Lu,et al.  From path tree to frequent patterns: a framework for mining frequent patterns , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[12]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[13]  Chengqi Zhang,et al.  A Database-Independent Approach of Mining Association Rules with Genetic Algorithm , 2003, IDEAL.

[14]  Mohammed J. Zaki,et al.  Theoretical Foundations of Association Rules , 2007 .

[15]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[17]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[18]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[19]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[20]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[21]  R OmiecinskiEdward Alternative Interest Measures for Mining Associations in Databases , 2003 .

[22]  John F. Roddick,et al.  What's interesting about Cricket?: on thresholds and anticipation in discovered rules , 2001, SKDD.

[23]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[24]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[25]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[26]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[27]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[28]  David B. Skillicorn,et al.  Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004 , 2004, SDM.

[29]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[30]  Geoffrey I. Webb,et al.  Identifying Approximate Itemsets of Interest in Large Databases , 2004, Applied Intelligence.

[31]  Gregory Piatetsky-Shapiro,et al.  Measuring lift quality in database marketing , 2000, SKDD.

[32]  Zhenxing Qin,et al.  Mining Term Association Rules for Heuristic Query Construction , 2004, PAKDD.

[33]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[34]  Hui Xiong,et al.  Generalizing the notion of support , 2004, KDD.

[35]  Wesley W. Chu,et al.  A Pattern Decomposition Algorithm for Data Mining of Frequent Patterns , 2002, Knowledge and Information Systems.

[36]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[37]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[38]  Osmar R. Zaïane,et al.  Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining , 2003, KDD '03.

[39]  Tsau Young Lin,et al.  Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA , 2001 .

[40]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[41]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[42]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[43]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[44]  Ada Wai-Chee Fu,et al.  Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.

[45]  Ulrich Güntzer,et al.  Is pushing constraints deeply into the mining algorithms really what we want?: an alternative approach for association rule mining , 2002, SKDD.

[46]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[47]  Ke Wang,et al.  Pushing Support Constraints Into Association Rules Mining , 2003, IEEE Trans. Knowl. Data Eng..

[48]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[49]  Geoffrey I. Webb Efficient search for association rules , 2000, KDD '00.

[50]  Ke Wang,et al.  Discovering Frequent Substructures from Hierarchical Semi-structured Data , 2002, SDM.

[51]  Xindong Wu,et al.  Efficient mining of both positive and negative association rules , 2004, TOIS.

[52]  Ke Wang,et al.  Mining confident rules without support requirement , 2001, CIKM '01.

[53]  Chengqi Zhang,et al.  Association Rule Mining , 2002, Lecture Notes in Computer Science.