Mining Associations with the Collective Strength Approach

The large itemset model has been proposed in the literature for finding associations in a large database of sales transactions. A different method for evaluating and finding itemsets referred to as strongly collective itemsets is proposed. We propose a criterion stressing the importance of the actual correlation of the items with one another rather than their absolute level of presence. Previous techniques for finding correlated itemsets are not necessarily applicable to very large databases. We provide an algorithm which provides very good computational efficiency, while maintaining statistical robustness. The fact that this algorithm relies on relative measures rather than absolute measures such as support also implies that the method can be applied to find association rules in data sets in which items may appear in a sizeable percentage of the transactions (dense data sets), data sets in which the items have varying density, or even negative association rules.

[1]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  Sunita Sarawagi,et al.  Mining Surprising Patterns Using Temporal Description Length , 1998, VLDB.

[3]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[4]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[5]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[6]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[7]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[11]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[12]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[13]  MotwaniRajeev,et al.  Beyond market baskets , 1997 .

[14]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.

[15]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[16]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[17]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[18]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[19]  D. S. Moore,et al.  Measures of lack of fit from tests of chi-squared type , 1984 .