Soft constraints for pattern mining

Constraint-based pattern discovery is at the core of numerous data mining tasks. Patterns are extracted with respect to a given set of constraints (frequency, closedness, size, etc). In practice, many constraints require threshold values whose choice is often arbitrary. This difficulty is even harder when several thresholds are required and have to be combined. Moreover, patterns barely missing a threshold will not be extracted even if they may be relevant. The paper advocates the introduction of softness into the pattern discovery process. By using Constraint Programming, we propose efficient methods to relax threshold constraints as well as constraints involved in patterns such as the top-k patterns and the skypatterns. We show the relevance and the efficiency of our approach through a case study in chemoinformatics for discovering toxicophores.

[1]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[2]  Jürgen Bajorath,et al.  Emerging Chemical Patterns: A New Methodology for Molecular Classification and Compound Selection. , 2007 .

[3]  Jeffrey Xu Yu,et al.  Top-k Correlative Graph Mining , 2009, SDM.

[4]  Marco Gavanelli,et al.  An Algorithm for Multi-Criteria Optimization in CSPs , 2002, ECAI.

[5]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[6]  Yufei Tao,et al.  Nearest Neighbor Queries in Network Databases , 2017, Encyclopedia of GIS.

[7]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[8]  Luc De Raedt,et al.  Constraint-Based Pattern Set Mining , 2007, SDM.

[9]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[10]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Christian Bessiere,et al.  An Original Constraint Based Approach for Solving over Constrained Problems , 2000, CP.

[12]  Luc De Raedt,et al.  Itemset mining: A constraint programming perspective , 2011, Artif. Intell..

[13]  Bruno Crémilleux,et al.  Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs , 2011, Journal of Intelligent Information Systems.

[14]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[15]  Eyke Hüllermeier,et al.  Fuzzy methods in machine learning and data mining: Status and prospects , 2005, Fuzzy Sets Syst..

[16]  Jiawei Han,et al.  Mining Thick Skylines over Large Databases , 2004, PKDD.

[17]  Stefano Bistarelli,et al.  Soft constraint based pattern mining , 2007, Data Knowl. Eng..

[18]  R. S. Laundy,et al.  Multiple Criteria Optimisation: Theory, Computation and Application , 1989 .

[19]  Patrice Boizumault,et al.  Soft Threshold Constraints for Pattern Mining , 2012, Discovery Science.

[20]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[22]  Chedy Raïssi,et al.  Mining Dominant Patterns in the Sky , 2011, 2011 IEEE 11th International Conference on Data Mining.

[23]  Luc De Raedt,et al.  Constraint programming for itemset mining , 2008, KDD.

[24]  Patrice Boizumault,et al.  Constraint Programming for Mining n-ary Patterns , 2010, CP.

[25]  Jirí Matousek,et al.  Computing Dominances in E^n , 1991, Inf. Process. Lett..

[26]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.