论文信息 - Item Sets that Compress

Item Sets that Compress

One of the major problems in frequent item set mining is the explosion of the number of results: it is difficult to find the most interesting frequent item sets. The cause of this explosion is that large sets of frequent item sets describe essentially the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of frequent item sets is that set that compresses the database best. We introduce four heuristic algorithms for this task, and the experiments show that these algorithms give a dramatic reduction in the number of frequent item sets. Moreover, we show how our approach can be used to determine the best value for the min-sup threshold.

[1] AgrawalRakesh,et al. Mining association rules between sets of items in large databases , 1993 .

[2] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[3] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[5] C. S. Wallace,et al. Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[6] Heikki Mannila,et al. Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[7] M. Kohler. Wallace CS: Statistical and inductive inference by minimum message length , 2006 .

[8] Jiawei Han,et al. Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[9] Abraham Silberschatz,et al. What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[10] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[11] Aristides Gionis,et al. Approximating a collection of frequent sets , 2004, KDD.

[12] Takashi Washio,et al. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[13] Mohammed J. Zaki,et al. Theoretical Foundations of Association Rules , 2007 .