Mining constraint-based patterns using automatic relaxation

Constraint-based mining is an active field of research which is a necessary step to achieve interactive and successful KDD processes. The limitations of the task lies in languages being limited to describe the mined patterns and the ability to express varied constraints. In practice, current approaches focus on a language and the most generic frameworks mine individually or simultaneously a monotone and an anti-monotone constraints. In this paper, we propose a generic framework dealing with any partially ordered language and a large set of constraints. We prove that this set of constraints called primitive-based constraints not only is a superclass of both kinds of monotone ones and their boolean combinations but also other classes such as convertible and succinct constraints. We show that the primitive-based constraints can be efficiently mined thanks to a relaxation method based on virtual patterns which summarize the specificities of the search space. Indeed, this approach automatically deduces pruning conditions having suitable monotone properties and thus these conditions can be pushed into usual constraint mining algorithms. We study the optimal relaxations. Finally, we provide an experimental illustration of the efficiency of our proposal by experimenting it on several contexts.

[1]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[2]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS '97.

[3]  Ian Witten,et al.  Data Mining , 2000 .

[4]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[5]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[6]  Roberto J. Bayardo The Hows, Whys, and Whens of Constraints in Itemset and Rule Discovery , 2004, Constraint-Based Mining and Inductive Databases.

[7]  Shinichi Morishita,et al.  Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[8]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[9]  Cláudia Antunes,et al.  Constraint Relaxations for Discovering Unknown Sequential Patterns , 2004, KDID.

[10]  Bruno Crémilleux,et al.  Optimizing constraint-based mining by automatically relaxing constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[11]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[12]  Jiawei Han,et al.  Divide-and-approximate: a novel constraint push strategy for iceberg cube mining , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[14]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[15]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[16]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[17]  Baptiste Jeudy,et al.  Database Transposition for Constrained (Closed) Pattern Mining , 2004, KDID.

[18]  Francesco Bonchi,et al.  Pushing Tougher Constraints in Frequent Pattern Mining , 2005, PAKDD.

[19]  Dino Pedreschi,et al.  ExAnte: Anticipated Data Reduction in Constrained Pattern Mining , 2003, PKDD.

[20]  Bruno Crémilleux,et al.  Exploiting Virtual Patterns for Automatically Pruning the Search Space , 2005, KDID.

[21]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[22]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[23]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[24]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[25]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[26]  Luc De Raedt,et al.  The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding , 2001, IJCAI.

[27]  Osmar R. Zaïane,et al.  Bifold constraint-based mining by simultaneous monotone and anti-monotone checking , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[28]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[29]  Bruno Crémilleux,et al.  An Efficient Framework for Mining Flexible Constraints , 2005, PAKDD.

[30]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[31]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[32]  Cheikh Talibouya Diop,et al.  Condensed Representations for Sets of Mining Queries , 2004, Database Support for Data Mining Applications.

[33]  Bruno Crémilleux,et al.  Constraint-Based Knowledge Discovery from SAGE Data , 2008, Silico Biol..

[34]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[35]  Francesco Bonchi,et al.  On closed constrained frequent pattern mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[36]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[37]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[38]  Stefano Bistarelli,et al.  Interestingness is Not a Dichotomy: Introducing Softness in Constrained Pattern Mining , 2005, PKDD.

[39]  Osmar R. Zaïane,et al.  Non-recursive Generation of Frequent K-itemsets from Frequent Pattern Tree Representations , 2003, DaWaK.

[40]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS.

[41]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[42]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[43]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[44]  Jianyong Wang,et al.  Efficient closed pattern mining in the presence of tough block constraints , 2004, KDD.

[45]  Feng Gao,et al.  Towards Generic Pattern Mining , 2005, ICFCA.

[46]  T. Imielinski,et al.  A database perspective on knowledge discovery : A database perspective on knowledge discovery , 1996 .

[47]  Luc De Raedt,et al.  An algebra for inductive query evaluation , 2003, Third IEEE International Conference on Data Mining.

[48]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[49]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[50]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.