Fast Discovery of Relevant Subgroup Patterns

Subgroup discovery is a prominent data mining method for discovering local patterns. Since often a set of very similar, overlapping subgroup patterns is retrieved, efficient methods for extracting a set of relevant subgroups are required. This paper presents a novel algorithm based on a vertical data structure, that not only discovers interesting subgroups quickly, but also integrates efficient filtering of patterns, that are considered irrelevant due to their overlap. Additionally, we show how the algorithm can be easily applied in a distributed setting. Finally, we provide an evaluation of the presented approach using representative data sets.

[1]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[5]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Osmar R. Zaïane,et al.  Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Nada Lavrac,et al.  Relevancy in Constraint-Based Subgroup Discovery , 2004, Constraint-Based Mining and Inductive Databases.

[8]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[9]  Friedemann Mattern,et al.  Algorithms for distributed termination detection , 1987, Distributed Computing.

[10]  Nada Lavrac,et al.  Closed Sets for Labeled Data , 2006, PKDD.

[11]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[12]  Michael Wurst,et al.  Distributed Subgroup Mining , 2006, PKDD.

[13]  Nhien-An Le-Khac,et al.  Grid-based approach for distributed frequent itemsets mining using dynamic workload management , 2007 .

[14]  Stefan Wrobel,et al.  Tight Optimistic Estimates for Fast Subgroup Discovery , 2008, ECML/PKDD.

[15]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Florian Lemmerich,et al.  Fast Subgroup Discovery for Continuous Target Concepts , 2009, ISMIS.