Parametric Algorithms for Mining Share Frequent Itemsets

Itemset share, the fraction of some numerical total contributed by items when they occur in itemsets, has been proposed as a measure of the importance of itemsets in association rule mining. The IAB and CAC algorithms are able to find share frequent itemsets that have infrequent subsets. These algorithms perform well, but they do not always find all possible share frequent itemsets. In this paper, we describe the incorporation of a threshold factor into these algorithms. The threshold factor can be used to increase the number of frequent itemsets found at a cost of an increase in the number of infrequent itemsets examined. The modified algorithms are tested on a large commercial database. Their behavior is examined using principles of classifier evaluation from machine learning.

[1]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[3]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4]  C. Goose,et al.  Glossary of Terms , 2004, Machine Learning.

[5]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[6]  Howard J. Hamilton,et al.  Algorithms for Mining Share Frequent Itemsets Containing Infrequent Subsets , 2000, PKDD.

[7]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[8]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[9]  MotwaniRajeev,et al.  Beyond market baskets , 1997 .

[10]  Howard J. Hamilton,et al.  Extracting Share Frequent Itemsets with Infrequent Subsets , 2003, Data Mining and Knowledge Discovery.

[11]  Rajeev Motwani,et al.  Beyond Market Baskets: Generalizing Association Rules to Dependence Rules , 1998, Data Mining and Knowledge Discovery.

[12]  Philip S. Yu,et al.  Mining association rules with adjustable accuracy , 1997, CIKM '97.

[13]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[14]  Gregory Piatetsky-Shapiro,et al.  A Comparison of Approaches for Maximizing Business Payoff of Prediction Models , 1996, KDD.

[15]  Nick Cercone,et al.  Share Based Measures for Itemsets , 1997, PKDD.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[18]  Nick Cercone,et al.  Mining Association Rules from Market Basket Data using Share Measures and Characterized Itemsets , 1998, Int. J. Artif. Intell. Tools.