Database Transposition for Constrained (Closed) Pattern Mining

Recently, different works proposed a new way to mine patterns in databases with pathological size. For example, experiments in genome biology usually provide databases with thousands of attributes (genes) but only tens of objects (experiments). In this case, mining the “transposed” database runs through a smaller search space, and the Galois connection allows to infer the closed patterns of the original database. We focus here on constrained pattern mining for those unusual databases and give a theoretical framework for database and constraint transposition. We discuss the properties of constraint transposition and look into classical constraints. We then address the problem of generating the closed patterns of the original database satisfying the constraint, starting from those mined in the “transposed” database. Finally, we show how to generate all the patterns satisfying the constraint from the closed ones.

[1]  Jean-François Boulicaut,et al.  Optimization of association rule mining queries , 2002, Intell. Data Anal..

[2]  R. Wille Concept lattices and conceptual knowledge systems , 1992 .

[3]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[4]  Luc De Raedt,et al.  The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding , 2001, IJCAI.

[5]  Vladimir Gurvich,et al.  On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets , 2002, STACS.

[6]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[7]  Engelbert Mephu Nguifo,et al.  How well go lattice algorithms on currently used machine learning testBeds? , 2004, EGC.

[8]  Peter F. Stadler,et al.  Basic Properties of Filter Convergence Spaces , 2002 .

[9]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Jean-François Boulicaut,et al.  Constraint-Based Mining of Formal Concepts in Transactional Data , 2004, PAKDD.

[11]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[12]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[13]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[14]  Jean-François Boulicaut,et al.  Using Condensed Representations for Interactive Association Rule Mining , 2002, PKDD.

[15]  Bart Goethals,et al.  On Supporting Interactive Association Rule Mining , 2000, DaWaK.

[16]  Bruno Crémilleux,et al.  Condensed Representation of Emerging Patterns , 2004, PAKDD.

[17]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[18]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[19]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[20]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[21]  Jean-François Boulicaut,et al.  Using transposition for pattern discovery from microarray data , 2003, DMKD '03.

[22]  Dino Pedreschi,et al.  ExAnte: Anticipated Data Reduction in Constrained Pattern Mining , 2003, PKDD.

[23]  Peter F. Stadler,et al.  Generalized Topological Spaces in Evolutionary Theory and Combinatorial Chemistry , 2002, J. Chem. Inf. Comput. Sci..

[24]  Gerd Stumme,et al.  Mining Minimal Non-redundant Association Rules Using Frequent Closed Itemsets , 2000, Computational Logic.

[25]  Jean-François Boulicaut,et al.  Mining free itemsets under constraints , 2001, Proceedings 2001 International Database Engineering and Applications Symposium.

[26]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[27]  Engelbert Mephu Nguifo,et al.  IGLUE: A lattice-based constructive induction system , 2001, Intell. Data Anal..

[28]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.