Average Number of Frequent and Closed Patterns in Random Databases

Résumé : Frequent and closed patterns are at the core of numerous Knowledge Discovery processes. Their mining is known to be difficult, because of the huge size of the search space, exponentially growing with the number of attributes. Unfortunately, most studies about pattern mining do not address the difficulty of the task, and provide their own algorithm. In this paper, we propose some new results about the average number of frequent patterns, by using probabilistic techniques and we extend these results to the number of closed patterns. In a first step, the probabilistic model is simple and far from the real life since the attributes and the objects are considered independent. Nevertheless according to this model, frequency threshold phenomena observed in practice are explained. We also prove that, for a fixed threshold, the number of frequent patterns is asymptotically exponential in the number of attributes and polynomial in the number of objects whereas, for a frequency threshold proportional to the number of objects, the number of frequent and closed patterns is asymptotically polynomial in the number of attributes without depending on the number of objects. Mots-clés : data mining, average analysis, frequent and closed patterns

[1]  Dimitrios Gunopulos,et al.  Discovering All Most Specific Sentences by Randomized Algorithms , 1997, ICDT.

[2]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  Toon Calders,et al.  Minimal k-Free Representations of Frequent Sets , 2003, PKDD.

[5]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  Toon Calders,et al.  Theoretical Bounds on the Size of Condensed Representations , 2004, KDID.

[7]  Sergei O. Kuznetsov,et al.  Comparing performance of algorithms for generating concept lattices , 2002, J. Exp. Theor. Artif. Intell..

[8]  Bart Goethals,et al.  A tight upper bound on the number of candidate patterns , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[10]  R. Wille Concept lattices and conceptual knowledge systems , 1992 .

[11]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[12]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS.

[13]  Hiroki Arimura,et al.  LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets , 2003, FIMI.

[14]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[15]  Dennis P. Groth,et al.  Average-Case Performance of the Apriori Algorithm , 2004, SIAM J. Comput..

[16]  Engelbert Mephu Nguifo,et al.  Étude et conception d'algorithmes de génération de concepts formels , 2004, Ingénierie des Systèmes d Inf..

[17]  Vladimir Gurvich,et al.  On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets , 2002, STACS.