论文信息 - Mining Condensed Frequent-Pattern Bases

Mining Condensed Frequent-Pattern Bases

Frequent-pattern mining has been studied extensively and has many useful applications. However, frequent-pattern mining often generates too many patterns to be truly efficient or effective. In many applications, it is sufficient to generate and examine frequent patterns with a sufficiently good approximation of the support frequency instead of in full precision. Such a compact but “close-enough” frequent-pattern base is called a condensed frequent-pattern base.In this paper, we propose and examine several alternatives for the design, representation, and implementation of such condensed frequent-pattern bases. Several algorithms for computing such pattern bases are proposed. Their effectiveness at pattern compression and methods for efficiently computing them are investigated. A systematic performance study is conducted on different kinds of databases, and demonstrates the effectiveness and efficiency of our approach in handling frequent-pattern mining in large databases.

Jian Pei | Jiawei Han | Guozhu Dong | Wei Zou

[1] Laks V. S. Lakshmanan,et al. Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[2] Christos Faloutsos,et al. NetCube: A Scalable Tool for Fast Data Mining and Compression , 2001, VLDB.

[3] Leonid Khachiyan,et al. Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[4] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[5] Jian Pei,et al. CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[6] Qiming Chen,et al. PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[7] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[8] Mohammed J. Zaki,et al. Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9] Jean-François Boulicaut,et al. Approximation of Frequency Queris by Means of Free-Sets , 2000, PKDD.

[10] Rajeev Motwani,et al. Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[11] Rajeev Motwani,et al. Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[12] Heikki Mannila,et al. Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[13] Jiawei Han,et al. Data Mining: Concepts and Techniques , 2000 .

[14] Mohammed J. Zaki. Generating non-redundant association rules , 2000, KDD '00.

[15] Umeshwar Dayal,et al. PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[16] Laks V. S. Lakshmanan,et al. Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[17] Martin Ester,et al. Frequent term-based text clustering , 2002, KDD.

[18] Qing Li,et al. Knowledge Discovery and Data Mining - PAKDD 2001, 5th Pacific-Asia Conference, Hong Kong, China, April 16-18, 2001, Proceedings , 2001, PAKDD.

[19] Laks V. S. Lakshmanan,et al. Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[20] Raghu Ramakrishnan,et al. Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[21] Jinyan Li,et al. Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[22] Jennifer Widom,et al. Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[23] Johannes Gehrke,et al. MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[24] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[25] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[26] Jian Pei,et al. Mining Multi-Dimensional Constrained Gradients in Data Cubes , 2001, VLDB.

[27] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[28] Charu C. Aggarwal,et al. A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[29] Jian Pei,et al. CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[30] Jiawei Han,et al. Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes , 1997, KDD.

[31] Heikki Mannila,et al. Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[32] Jiawei Han,et al. Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[33] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[34] Nicolas Pasquier,et al. Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[35] Mohammed J. Zaki,et al. CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.