Mining generalized association rules

Abstract We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (is-a hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy that says that jackets is-a outerwear is-a clothes, we may infer a rule that “people who buy outerwear tend to buy shoes”. This rule may hold even if rules that “people who buy jackets tend to buy shoes”, and “people who buy clothes tend to buy shoes” do not hold. An obvious solution to the problem is to add all ancestors of each item in a transaction to the transaction, and then run any of the algorithms for mining association rules on these “extended transactions”. However, this “Basic” algorithm is not very fast; we present two algorithms, Cumulate and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one real-life dataset). Finally, we present a new interest-measure for rules which uses the information in the taxonomy. Given a user-specified “minimum-interest-level”, this measure prunes a large number of redundant rules; 40–60% of all the rules were pruned on two real-life datasets.

[1]  Arun N. Swami,et al.  Set-oriented mining for association rules in relational databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Philip S. Yu Review - Mining Association Rules between Sets of Items in Large Databases , 1999, ACM SIGMOD Digit. Rev..

[3]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[4]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[5]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[6]  T. J. Watson,et al.  An E ective Hash-Based Algorithm for Mining Association RulesJong , 1995 .

[7]  Michael J. Rothman,et al.  Applying Data Mining Techniques to a Health Insurance Information System , 1996, VLDB.

[8]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[9]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[10]  Torben Hagerup,et al.  A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..

[11]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[12]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[13]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[15]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[16]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[19]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[20]  M.A.W. Houtsma,et al.  Set-Oriented Mining for Association Rules , 1993, ICDE 1993.