Data Peeler: Contraint-Based Closed Pattern Mining in n-ary Relations

Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms which extract frequent closed sets are now available. Generalizing such a task to n-ary relations (n > 2) appears as a timely challenge. It may be important for many applications, e.g., when adding the time dimension to the popular objects x features binary case. The generality of the task -- no assumption being made on the relation arity or on the size of its attribute domains -- makes it computationally challenging. We introduce an algorithm called Data-Peeler. From a n-ary relation, it extracts all closed patterns satisfying given piecewise (anti)-monotonic constraints. This new class of constraints generalizes both monotonic and anti-monotonic constraints. Considering the special case of ternary relations, Data-Peeler outperforms the state-of-the-art algorithms CubeMiner and Trias by orders of magnitude. These good performances must be granted to a new clever enumeration strategy allowing an efficient closeness checking. An original application on a real-life 4-ary relation is used to assess the relevancy of closed {\pattern s} constraint-based mining.

[1]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[2]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[3]  Alain Gély A Generic Algorithm for Generating Closed Sets of a Binary Relation , 2005, ICFCA.

[4]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[5]  Jian Pei,et al.  Mining coherent gene clusters from gene-sample-time microarray data , 2004, KDD.

[6]  Jean-François Boulicaut,et al.  Constraint-based concept mining and its application to microarray data analysis , 2005, Intell. Data Anal..

[7]  Anthony K. H. Tung,et al.  Mining frequent closed cubes in 3D datasets , 2006, VLDB.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Andreas Hotho,et al.  TRIAS--An Algorithm for Mining Iceberg Tri-Lattices , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[11]  Luc De Raedt,et al.  On Mining Closed Sets in Multi-Relational Data , 2007, IJCAI.

[12]  Ruggero G. Pensa,et al.  Clustering Formal Concepts to Discover Biologically Relevant Knowledge from Gene Expression Data , 2007, Silico Biol..

[13]  Nripendra N. Biswas,et al.  Minimization of Boolean Functions , 1971, IEEE Transactions on Computers.

[14]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  M. Karnaugh The map method for synthesis of combinational logic circuits , 1953, Transactions of the American Institute of Electrical Engineers, Part I: Communication and Electronics.

[17]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[18]  Aristides Gionis,et al.  Mining chains of relations , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[19]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[20]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.