Replacing Support in Association Rule Mining

Association rules are an intuitive descriptive paradigm that has been used extensively in later years and in different application domains with the purpose to identify the regularities and correlation in a set of observed objects. However, recently, association rules’ statistical measures (support and confidence) have been criticized because in some cases have shown to fail their primary goal that is to select the most relevant and significant association rules. In this paper we propose a new model that replaces the support measure. The new model, like support, is a tool for the identification of the reliable rules and is used also to reduce the traversal of the itemsets search space. The proposed model adopts new criteria in order to establish the reliability of the information extracted from the database. These criteria are based on Bayes’ Theorem and on an estimate of the probability density function of each itemset. According to our criteria, the information that we have obtained from the database on an itemset is reliable if and only if the confidence interval of the estimated probability is low compared with the most likely value of it. We will see how this method can be computed in an approximated way, but satisfactory, with computational time comparable to the test on support

[1]  Shinichi Morishita,et al.  Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[2]  Jitender S. Deogun,et al.  SPICE: A New Framework for Data Mining based on Probability Logic and Formal Concept Analysis , 2007, Fundam. Informaticae.

[3]  Salvatore J. Stolfo,et al.  Mining Audit Data to Build Intrusion Detection Models , 1998, KDD.

[4]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.

[5]  Heikki Mannila,et al.  Prediction with local patterns using cross-entropy , 1999, KDD '99.

[6]  Philip S. Yu,et al.  Mining association rules with adjustable accuracy , 1997, CIKM '97.

[7]  John F. Roddick,et al.  What's interesting about Cricket?: on thresholds and anticipation in discovered rules , 2001, SKDD.

[8]  Kurt Hornik,et al.  New probabilistic interest measures for association rules , 2007, Intell. Data Anal..

[9]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[10]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD.

[11]  Rangasami L. Kashyap,et al.  Discovering quasi-equivalence relationships from database systems , 1999, CIKM '99.

[12]  Ke Wang,et al.  Mining confident rules without support requirement , 2001, CIKM '01.

[13]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[14]  Arno Siebes Homogeneous Discoveries Contain No Surprises: Inferring Risk Profiles from Large Databases , 1994, KDD Workshop.

[15]  Rosa Meo Theory of dependence values , 2000, TODS.

[16]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[17]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[18]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[19]  Xindong Wu,et al.  Computing the minimum-support for mining frequent patterns , 2008, Knowledge and Information Systems.

[20]  Hui Xiong,et al.  Generalizing the notion of support , 2004, KDD.

[21]  Wynne Hsu,et al.  Mining interesting knowledge using DM-II , 1999, KDD '99.

[22]  Rajeev Motwani,et al.  Beyond Market Baskets: Generalizing Association Rules to Dependence Rules , 1998, Data Mining and Knowledge Discovery.

[23]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.