Generating a set of association and decision rules with statistically representative support and anti-support

Abstract The problem of rule evaluation is central to association and decision rule mining. In the literature, many attractiveness measures assessing the utility of discovered rules have been proposed. In particular, two-dimensional evaluation planes like support – confidence or support – anti-support are a frequently employed scheme. The inherent drawback of attractiveness measures is the necessity to provide a cutoff threshold which defines the minimal or maximal acceptable value of a given measure. Such threshold is usually unintuitive, meaningless and difficult to impose with objectivity. In this paper, we focus on the support – anti-support evaluation plane. We propose a methodology of simultaneously assessing statistical representativeness of both measures by performing multinomial tests on association or decision rules. The proposed approach combines assessing statistical soundness of the rules with an implicit eliciting of a threshold in both measures from the user. The latter is accomplished by employing the notion of a relative error in both support and anti-support. We evaluate the proposed method on a number of data sets and provide general conclusions.

[1]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[2]  Matti Nykänen,et al.  Efficient Discovery of Statistically Significant Association Rules , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  Alastair Scott,et al.  Quick Simultaneous Confidence Intervals for Multinomial Proportions , 1987 .

[4]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[5]  Dino Ienco,et al.  Replacing Support in Association Rule Mining , 2009 .

[6]  D. C. Hurst,et al.  Large Sample Simultaneous Confidence Intervals for Multinomial Proportions , 1964 .

[7]  Nimrod Megiddo,et al.  Discovering Predictive Association Rules , 1998, KDD.

[8]  Wilhelmiina Hämäläinen,et al.  StatApriori: an efficient algorithm for searching statistically significant association rules , 2010, Knowledge and Information Systems.

[9]  Patrick Meyer,et al.  On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid , 2008, Eur. J. Oper. Res..

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Salvatore Greco,et al.  Assessing the Quality of Rules with a New Monotonic Interestingness Measure Z , 2006, ICAISC.

[12]  R. Z. Gold Tests Auxiliary to $\chi^2$ Tests in a Markov Chain , 1963 .

[13]  Norman Matloff A New Method for Rule Finding Via Bootstrapped Confidence Intervals , 2008, SDM.

[14]  Izabela Szczech,et al.  Multicriteria Attractiveness Evaluation of Decision and Association Rules , 2009, Trans. Rough Sets.

[15]  A. Genz,et al.  Numerical evaluation of singular multivariate normal distributions , 2000 .

[16]  Salvatore Greco,et al.  Mining Pareto-optimal rules with respect to support and confirmation or support and anti-support , 2007, Eng. Appl. Artif. Intell..

[17]  Alan J. Lee,et al.  Confidence regions for multinomial parameters , 2002 .

[18]  Zdzisław Pawlak,et al.  Can Bayesian confirmation measures be useful for rough set decision rules? , 2004, Eng. Appl. Artif. Intell..

[19]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[20]  Salvatore Greco,et al.  Mining Association Rules with Respect to Support and Anti-support-Experimental Results , 2007, RSEISP.

[21]  Branden Fitelson,et al.  STUDIES IN BAYESIAN CONFIRMATION THEORY , 2001 .

[22]  Wojciech Jaworski,et al.  Rule Induction: Combining Rough Set and Statistical Approaches , 2008, RSCTC.

[23]  D. Bauer Constructing Confidence Sets Using Rank Statistics , 1972 .

[24]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[25]  Salvatore Greco,et al.  Properties of rule interestingness measures and alternative approaches to normalization of measures , 2012, Inf. Sci..

[26]  Carl G. Hempel,et al.  I.—STUDIES IN THE LOGIC OF CONFIRMATION (II.) , 1945 .

[27]  L. A. Goodman On Simultaneous Confidence Intervals for Multinomial Proportions , 1965 .

[28]  Djalil CHAFAÏ,et al.  Confidence Regions for the Multinomial Parameter With Small Sample Size , 2008, 0805.1971.

[29]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[30]  Hsiuying Wang,et al.  Exact confidence coefficients of simultaneous confidence intervals for multinomial proportions , 2008 .

[31]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[32]  Geoffrey I. Webb Discovering significant patterns , 2008, Machine Learning.

[33]  J. Glaz,et al.  Simultaneous confidence intervals for multinomial proportions , 1999 .

[34]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.