Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract)

In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used as a condensed representation for answering various types of queries. Given a table r with 0/1 values and a threshold σ, a frequent set of r is a set X of columns of r such that at least a fraction σ of the rows of r have a 1 in all the columns of X. Finding frequent sets is a first step in finding association rules, and there exists several efficient algorithms for finding the frequent sets. We show that frequent sets have wider applications than just finding association rules. We show that using the inclusion-exclusion principle one can obtain approximate confidences of arbitrary boolean rules. We derive bounds for the errors in the confidences, and show that information collected during the computation of frequent sets can also be used to provide individual error bounds for each clause. Experiments show that this method enables one to obtain different forms of rules from data extremely fast. Furthermore, we define a general notion of condensed representations, and show that frequent sets, samples and the data cube can be viewed as instantations of this concept.

[1]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  Hannu T. T. Toivonen,et al.  Samplinglarge databases for finding association rules , 1996, VLDB 1996.

[4]  Alex Samorodnitsky,et al.  Inclusion-exclusion: Exact and approximate , 1996, Comb..

[5]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[6]  M.A.W. Houtsma,et al.  Set-Oriented Mining for Association Rules , 1993, ICDE 1993.

[7]  David A. Grable,et al.  Sharpened Bonferroni Inequalities , 1993, J. Comb. Theory, Ser. B.

[8]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[9]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[10]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[11]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[12]  Heikki Mannila,et al.  On an algorithm for finding all interesting sentences , 1996 .

[13]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[14]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[15]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[16]  Noam Nisan,et al.  Approximate Inclusion-Exclusion , 1990, Comb..

[17]  Ketan Mulmuley,et al.  Computational geometry : an introduction through randomized algorithms , 1993 .

[18]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[19]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[20]  David Haussler,et al.  ɛ-nets and simplex range queries , 1987, Discret. Comput. Geom..

[21]  Heikki Mannila,et al.  A Perspective on Databases and Data Mining , 1995, KDD.

[22]  David A. Grable Hypergraphs and sharpened sieve inequalities , 1994, Discret. Math..