One in a million: picking the right patterns

Constrained pattern mining extracts patterns based on their individual merit. Usually this results in far more patterns than a human expert or a machine leaning technique could make use of. Often different patterns or combinations of patterns cover a similar subset of the examples, thus being redundant and not carrying any new information. To remove the redundant information contained in such pattern sets, we propose two general heuristic algorithms—Bouncer and Picker—for selecting a small subset of patterns. We identify several selection techniques for use in this general algorithm and evaluate those on several data sets. The results show that both techniques succeed in severely reducing the number of patterns, while at the same time apparently retaining much of the original information. Additionally, the experiments show that reducing the pattern set indeed improves the quality of classification results. Both results show that the developed solutions are very well suited for the goals we aim at.

[1]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[2]  Nada Lavrac,et al.  Relevancy in Constraint-Based Subgroup Discovery , 2004, Constraint-Based Mining and Inductive Databases.

[3]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[4]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[5]  Luc De Raedt,et al.  kFOIL: Learning Simple Relational Kernels , 2006, AAAI.

[6]  Nicolas Pasquier,et al.  Mining Bases for Association Rules Using Closed Sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7]  Jian Pei,et al.  Mining Condensed Frequent-Pattern Bases , 2003, Knowledge and Information Systems.

[8]  Christian Borgelt Recursion Pruning for the Apriori Algorithm , 2004, FIMI.

[9]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[10]  Luc De Raedt,et al.  Constraint-Based Mining and Inductive Databases, European Workshop on Inductive Databases and Constraint Based Mining, Hinterzarten, Germany, March 11-13, 2004, Revised Selected Papers , 2005, Constraint-Based Mining and Inductive Databases.

[11]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[12]  Albrecht Zimmermann,et al.  Tree2 - Decision Trees for Tree Structured Data , 2005, LWA.

[13]  Jean-François Boulicaut,et al.  Mining free itemsets under constraints , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[14]  Francesco Bonchi,et al.  On condensed representations of constrained frequent patterns , 2005, Knowledge and Information Systems.

[15]  Vipin Kumar,et al.  Summarization - compressing data into an informative representation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16]  Arno J. Knobbe,et al.  Pattern Teams , 2006, PKDD.