Exact and Approximate Minimal Pattern Mining

Condensed representations have been studied extensively for 15 years. In particular, the maximal patterns of the equivalence classes have received much attention with very general proposals. In contrast, the minimal patterns remained in the shadows in particular because they are too numerous and they are difficult to extract. In this paper, we present a generic framework for exact and approximate minimal patterns mining by introducing the concept of minimizable set system. This framework based on set systems addresses various languages such as itemsets or strings, and at the same time, different metrics such as frequency. For instance, the free, \(\delta \)-free and the essential patterns are naturally handled by our approach, just as the minimal strings. Then, for any minimizable set system, we introduce a fast minimality checking method that is easy to incorporate in a depth-first search algorithm for mining the \(\delta \)-minimal patterns. We demonstrate that it is polynomial-delay and polynomial-space. Experiments on traditional benchmarks complete our study by showing that our approach is competitive with the best proposals.

[1]  Jean-François Boulicaut,et al.  Approximation of Frequency Queris by Means of Free-Sets , 2000, PKDD.

[2]  Jean-François Boulicaut,et al.  Simplest Rules Characterizing Classes Generated by δ-Free Sets , 2003 .

[3]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[4]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[5]  Bruno Crémilleux,et al.  Nonredundant Generalized Rules and Their Impact in Classification , 2010, Advances in Intelligent Information Systems.

[6]  Bruno Crémilleux,et al.  Condensed Representation of EPs and Patterns Quantified by Frequency-Based Measures , 2004, KDID.

[7]  Bruno Crémilleux,et al.  Mining Frequent delta-Free Patterns in Large Databases , 2005, Discovery Science.

[8]  Arnaud Giacometti,et al.  20 years of pattern mining: a bibliometric survey , 2014, SKDD.

[9]  Lotfi Lakhal,et al.  Essential Patterns: A Perfect Cover of Frequent Patterns , 2005, DaWaK.

[10]  Siau-Cheng Khoo,et al.  Mining and Ranking Generators of Sequential Pattern , 2008, SDM 2008.

[11]  Heikki Mannila,et al.  Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract) , 1996, KDD.

[12]  Tarek Hamrouni,et al.  Key roles of closed sets and minimal generators in concise representations of frequent patterns , 2012, Intell. Data Anal..

[13]  Engelbert Mephu Nguifo,et al.  Extraction of Association Rules Based on Literalsets , 2007, DaWaK.

[14]  Marzena Kryszkiewicz Generalized disjunction-free representation of frequent patterns with negation , 2005, J. Exp. Theor. Artif. Intell..

[15]  Amedeo Napoli,et al.  Efficient Vertical Mining of Frequent Closures and Generators , 2009, IDA.

[16]  Jun Zhang,et al.  FOGGER: an algorithm for graph generator discovery , 2009, EDBT '09.

[17]  Jian Pei,et al.  Minimum Description Length Principle: Generators Are Preferable to Closed Patterns , 2006, AAAI.

[18]  Georg Gottlob,et al.  Hypergraph Transversal Computation and Related Problems in Logic and AI , 2002, JELIA.

[19]  Jianyong Wang,et al.  Efficient mining of frequent sequence generators , 2008, WWW.

[20]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[21]  Takeaki Uno,et al.  Efficient algorithms for dualizing large-scale hypergraphs , 2011, Discret. Appl. Math..

[22]  Hiroki Arimura,et al.  Polynomial-Delay and Polynomial-Space Algorithms for Mining Closed Sequences, Graphs, and Pictures in Accessible Set Systems , 2009, SDM.

[23]  Toon Calders,et al.  Minimal k-Free Representations of Frequent Sets , 2003, PKDD.

[24]  Sadok Ben Yahia,et al.  Efficient unveiling of multi-members in a social network , 2014, J. Syst. Softw..

[25]  Siau-Cheng Khoo,et al.  Non-redundant sequential rules - Theory and algorithm , 2009, Inf. Syst..

[26]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[27]  François Rioult,et al.  Efficiently Depth-First Minimal Pattern Mining , 2014, PAKDD.

[28]  Jean-François Boulicaut,et al.  A Survey on Condensed Representations for Frequent Sets , 2004, Constraint-Based Mining and Inductive Databases.

[29]  Jinyan Li,et al.  A new concise representation of frequent itemsets using generators and a positive border , 2008, Knowledge and Information Systems.

[30]  Bruno Crémilleux,et al.  Adequate condensed representations of patterns , 2008, Data Mining and Knowledge Discovery.

[31]  Toon Calders,et al.  Depth-First Non-Derivable Itemset Mining , 2005, SDM.