Contribution of statistical learning to validation of association rules

Many measures aim at evaluating the interest of association rules. The subject of this article is the detailed study of confidence in t rvals associated to the evaluation of these measures. The following difficulties arise: Samples being finite, we restrict our attention to non-asymp totic bounds. The number of tested rules can be large. So, it is not statisti cally possible to treat the rules separately: risks accumulate an d one could thus ”validate” absurd rules. We do not only work on rules without exception; rules with con fidence lower than1 can be important. The solution we propose is based upon VC-dimension, a classi al tool of learning theory.

[1]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[2]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[3]  Padhraic Smyth,et al.  Information-Theoretic Rule Induction , 1988, ECAI.

[4]  K. Alexander,et al.  Probability Inequalities for Empirical Processes and a Law of the Iterated Logarithm , 1984 .

[5]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[6]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[7]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[8]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[9]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[10]  John Shawe-Taylor,et al.  Sparsity vs. Large Margins for Linear Classifiers , 2000, COLT.

[11]  Eduardo Sontag VC dimension of neural networks , 1998 .

[12]  L. Devroye Bounds for the Uniform Deviation of Empirical Measures , 1982 .

[13]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[14]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[15]  Alex Alves Freitas,et al.  Understanding the crucial differences between classification and discovery of association rules: a position paper , 2000, SKDD.

[16]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[17]  Peter R. Nelson,et al.  Multiple Comparisons: Theory and Methods , 1997 .

[18]  Sylvie Helene Guillaume,et al.  Traitement des donnees volumineuses. Mesures et algorithmes d'extraction de regles d'association et regles ordinales , 2000 .

[19]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[20]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[21]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[22]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[23]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[24]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[25]  Heikki Mannila,et al.  The power of sampling in knowledge discovery , 1994, PODS '94.