论文信息 - Computing the minimum-support for mining frequent patterns

Computing the minimum-support for mining frequent patterns

Frequent pattern mining is based on the assumption that users can specify the minimum-support for mining their databases. It has been recognized that setting the minimum-support is a difficult task to users. This can hinder the widespread applications of these algorithms. In this paper we propose a computational strategy for identifying frequent itemsets, consisting of polynomial approximation and fuzzy estimation. More specifically, our algorithms (polynomial approximation and fuzzy estimation) automatically generate actual minimum-supports (appropriate to a database to be mined) according to users’ mining requirements. We experimentally examine the algorithms using different datasets, and demonstrate that our fuzzy estimation algorithm fittingly approximates actual minimum-supports from the commonly-used requirements.

[1] Johannes Gehrke,et al. MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[2] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[4] Ramakrishnan Srikant,et al. Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[5] Edith Cohen,et al. Finding Interesting Associations without Support Pruning , 2001, IEEE Trans. Knowl. Data Eng..

[6] Philip S. Yu,et al. A new framework for itemset generation , 1998, PODS '98.

[7] Jinyan Li,et al. Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[8] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9] F RoddickJohn,et al. What's interesting about Cricket? , 2001 .

[10] Hongjun Lu,et al. From path tree to frequent patterns: a framework for mining frequent patterns , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[11] Rajeev Motwani,et al. Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[12] Shamkant B. Navathe,et al. An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[13] Chengqi Zhang,et al. A Database-Independent Approach of Mining Association Rules with Genetic Algorithm , 2003, IDEAL.

[14] Mohammed J. Zaki,et al. Theoretical Foundations of Association Rules , 2007 .

[15] Hongjun Lu,et al. H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16] Huan Liu,et al. Instance Selection and Construction for Data Mining , 2001 .

[17] Ramesh C Agarwal,et al. Depth first generation of long patterns , 2000, KDD '00.

[18] Philip S. Yu,et al. An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[19] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[20] Rakesh Agrawal,et al. Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[21] R OmiecinskiEdward. Alternative Interest Measures for Mining Associations in Databases , 2003 .

[22] John F. Roddick,et al. What's interesting about Cricket?: on thresholds and anticipation in discovered rules , 2001, SKDD.

[23] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[24] Edith Cohen,et al. Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[25] Abraham Silberschatz,et al. What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[26] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[27] Jaideep Srivastava,et al. Selecting the right interestingness measure for association patterns , 2002, KDD.

[28] David B. Skillicorn,et al. Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004 , 2004, SDM.

[29] Jian Pei,et al. CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[30] Geoffrey I. Webb,et al. Identifying Approximate Itemsets of Interest in Large Databases , 2004, Applied Intelligence.

[31] Gregory Piatetsky-Shapiro,et al. Measuring lift quality in database marketing , 2000, SKDD.

[32] Zhenxing Qin,et al. Mining Term Association Rules for Heuristic Query Construction , 2004, PAKDD.

[33] Edward Omiecinski,et al. Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[34] Hui Xiong,et al. Generalizing the notion of support , 2004, KDD.

[35] Wesley W. Chu,et al. A Pattern Decomposition Algorithm for Data Mining of Frequent Patterns , 2002, Knowledge and Information Systems.

[36] Srinivasan Parthasarathy,et al. New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[37] Rajeev Motwani,et al. Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[38] Osmar R. Zaïane,et al. Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining , 2003, KDD '03.

[39] Tsau Young Lin,et al. Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA , 2001 .

[40] Zvi M. Kedem,et al. Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[41] Jiawei Han,et al. Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[42] Jian Pei,et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[43] Jiawei Han,et al. BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[44] Ada Wai-Chee Fu,et al. Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.

[45] Ulrich Güntzer,et al. Is pushing constraints deeply into the mining algorithms really what we want?: an alternative approach for association rule mining , 2002, SKDD.

[46] Vipin Kumar,et al. Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[47] Ke Wang,et al. Pushing Support Constraints Into Association Rules Mining , 2003, IEEE Trans. Knowl. Data Eng..

[48] Laks V. S. Lakshmanan,et al. Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[49] Geoffrey I. Webb. Efficient search for association rules , 2000, KDD '00.

[50] Ke Wang,et al. Discovering Frequent Substructures from Hierarchical Semi-structured Data , 2002, SDM.

[51] Xindong Wu,et al. Efficient mining of both positive and negative association rules , 2004, TOIS.

[52] Ke Wang,et al. Mining confident rules without support requirement , 2001, CIKM '01.

[53] Chengqi Zhang,et al. Association Rule Mining , 2002, Lecture Notes in Computer Science.