Mining Non-redundant Information-Theoretic Dependencies between Itemsets

We present an information-theoretic framework for mining dependencies between itemsets in binary data. The problem of closure-based redundancy in this context is theoretically investigated, and we present both lossless and lossy pruning techniques. An efficient and scalable algorithm is proposed, which exploits the inclusion-exclusion principle for fast entropy computation. This algorithm is empirically evaluated through experiments on synthetic and real-world data.

[1]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[2]  Heikki Mannila,et al.  Finding low-entropy sets and trees from binary data , 2007, KDD '07.

[3]  Szymon Jaroszewicz,et al.  Pruning Redundant Association Rules Using Maximum Entropy Principle , 2002, PAKDD.

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5]  Toon Calders,et al.  Quick Inclusion-Exclusion , 2005, KDID.

[6]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[7]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Mehmet M. Dalkilic,et al.  Information dependencies , 2000, PODS '00.

[10]  Hannu Toivonen,et al.  TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies , 1999, Comput. J..

[11]  Heikki Mannila,et al.  Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..

[12]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[13]  José L. Balcázar Minimum-Size Bases of Association Rules , 2008, ECML/PKDD.

[14]  Toon Calders,et al.  Non-derivable itemset mining , 2007, Data Mining and Knowledge Discovery.

[15]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[16]  Francesco Bonchi,et al.  Knowledge Discovery in Inductive Databases, 4th International Workshop, KDID 2005, Porto, Portugal, October 3, 2005, Revised Selected and Invited Papers , 2006, KDID.

[17]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[18]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.