Closed and noise-tolerant patterns in n-ary relations

Binary relation mining has been extensively studied. Nevertheless, many interesting 0/1 data naturally appear as n-ary relations with n ≥ 3. A timely challenge is to extend local pattern extraction, eg, closed pattern mining, to such contexts. When considering higher arities, faint noise affects more and more the quality of the extracted patterns. We study a declarative specification of error-tolerant patterns by means of new primitive constraints and the design of an efficient algorithm to extract every solution pattern. It exploits the enumeration principles of the state-of-the-art Data-Peeler algorithm for n-ary relation mining. Efficiently enforcing error-tolerance crucially depends on innovative strategies to incrementally compute partial information on the data. Our prototype is tested on both synthetic and real datasets. It returns relevant collections of patterns even in the case of noisy ternary or 4-ary relations, eg, in the context of pattern discovery from dynamic networks.

[1]  Jean-François Boulicaut,et al.  Closed patterns meet n-ary relations , 2009, TKDD.

[2]  Saso Dzeroski,et al.  Inductive Databases and Constraint-Based Data Mining , 2011, ICFCA.

[3]  Vivekanand Gopalkrishnan,et al.  Efficient Computation of Partial-Support for Mining Interesting Itemsets , 2009, SDM.

[4]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[5]  Andrew B. Nobel,et al.  Mining Approximate Frequent Itemsets In the Presence of Noise: Algorithm and Analysis , 2006, SDM.

[6]  Jian Pei,et al.  Mining frequent cross-graph quasi-cliques , 2009, TKDD.

[7]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[8]  Gerd Stumme,et al.  Formal Concept Analysis , 2009, Handbook on Ontologies.

[9]  Nello Cristianini,et al.  From frequent itemsets to informative patterns , 2009 .

[10]  Heikki Mannila,et al.  Dense itemsets , 2004, KDD.

[11]  Cheng Yang,et al.  Efficient discovery of error-tolerant frequent itemsets in high dimensions , 2001, KDD '01.

[12]  David Avis,et al.  Reverse Search for Enumeration , 1996, Discret. Appl. Math..

[13]  Ira Assent,et al.  CLICKS: an effective algorithm for mining subspace clusters in categorical datasets , 2005, KDD '05.

[14]  Nello Cristianini,et al.  MINI: Mining Informative Non-redundant Itemsets , 2007, PKDD.

[15]  Jianyong Wang,et al.  Out-of-core coherent closed quasi-clique mining from large dense graph databases , 2007, TODS.

[16]  Nada Lavrac,et al.  Closed Sets for Labeled Data , 2006, PKDD.

[17]  Bart Goethals,et al.  Frequent Set Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[18]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[19]  Jean-François Boulicaut,et al.  A Survey on Condensed Representations for Frequent Sets , 2004, Constraint-Based Mining and Inductive Databases.

[20]  Pan e Panov,et al.  Inductive Databases and Constraint-Based Data Mining , 2010 .

[21]  Jean-François Boulicaut,et al.  Agglomerating local patterns hierarchically with ALPHA , 2009, CIKM.

[22]  Vivekanand Gopalkrishnan,et al.  Mining Statistical Information of Frequent Fault-Tolerant Patterns in Transactional Databases , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[23]  Jia-Ling Koh,et al.  An Efficient Approach for Mining Fault-Tolerant Frequent Patterns Based on Bit Vector Representations , 2005, DASFAA.

[24]  Jean-François Boulicaut,et al.  Frequent Closures as a Concise Representation for Binary Data Mining , 2000, PAKDD.

[25]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[26]  Bernhard Schölkopf,et al.  Multi-way set enumeration in weight tensors , 2011, Machine Learning.

[27]  Jean-François Boulicaut,et al.  Mining a New Fault-Tolerant Pattern Type as an Alternative to Formal Concept Discovery , 2006, ICCS.

[28]  Vipin Kumar,et al.  Quantitative evaluation of approximate frequent pattern mining algorithms , 2008, KDD.

[29]  Jean-François Boulicaut,et al.  Constraint-based concept mining and its application to microarray data analysis , 2005, Intell. Data Anal..

[30]  Vivekanand Gopalkrishnan,et al.  Towards efficient mining of proportional fault-tolerant frequent itemsets , 2009, KDD.

[31]  Philip S. Yu,et al.  Approximate Frequent Itemset Mining In the Presence of Random Noise , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[32]  Jinyan Li,et al.  A case study on financial ratios via cross-graph quasi-bicliques , 2011, Inf. Sci..

[33]  Gerd Stumme,et al.  Formal Concept Analysis: Foundations and Applications (Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence) , 2005 .

[34]  Jean-François Boulicaut,et al.  Discovering Relevant Cross-Graph Cliques in Dynamic Networks , 2009, ISMIS.

[35]  Anthony K. H. Tung,et al.  Mining frequent closed cubes in 3D datasets , 2006, VLDB.

[36]  Jean-François Boulicaut,et al.  Mining Constrained Cross-Graph Cliques in Dynamic Networks , 2010, Inductive Databases and Constraint-Based Data Mining.

[37]  Andreas Hotho,et al.  TRIAS--An Algorithm for Mining Iceberg Tri-Lattices , 2006, Sixth International Conference on Data Mining (ICDM'06).

[38]  Takeaki Uno,et al.  An Efficient Algorithm for Enumerating Pseudo Cliques , 2007, ISAAC.

[39]  Gerd Stumme,et al.  Formal Concept Analysis: foundations and applications , 2005 .

[40]  Francesco Bonchi,et al.  On closed constrained frequent pattern mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[41]  Bernhard Schölkopf,et al.  Multi-way set enumeration in real-valued tensors , 2009, DMMT '09.

[42]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[43]  Francesco Bonchi,et al.  Pushing Tougher Constraints in Frequent Pattern Mining , 2005, PAKDD.

[44]  StummeGerd,et al.  Computing iceberg concept lattices with TITANIC , 2002 .

[45]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[46]  Anthony K. H. Tung,et al.  Fault-Tolerant Frequent Pattern Mining: Problems and Challenges , 2001, DMKD.