Pattern discovery via constraint programming

Pattern discovery is one of the most fundamental problems in data mining. Various patterns with their discovering algorithms are proposed in different applications and domains. There is still a great demand for defining new meaningful patterns with new requirements since every application has its unique characteristics. Existing studies propose new query languages to describe these ad-hoc patterns. However, most of them focus on small variations of frequent item sets and association rules. Many meaningful patterns in other domains, such as temporal and spatial patterns, are not covered. This paper proposes a constraint based view for pattern discovery without introducing new languages, where the patterns are described by a collection of constraints given at run time. In this view, a pattern discovery problem is seen as a constraint satisfaction problem. This view provides a general framework for universal pattern discovery. Many previously known patterns can be regarded as different variations derived from this general framework with different constraints. Two generic algorithms are proposed for solving the constraint satisfaction problem. Empirical evaluation on two well-studied patterns shows that (1) the time cost of one generic algorithm is close to that of those specialized mining algorithms, and (2) the space cost of the generic algorithm increases linearly according to the input data volume. Two other case studies also demonstrate the effectiveness of this constraint based view for solving new problems in new scenarios.

[1]  Jean-François Boulicaut,et al.  Constraint-based Data Mining , 2005, Data Mining and Knowledge Discovery Handbook.

[2]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[3]  Giuseppe Psaila,et al.  An Extension to SQL for Mining Association Rules , 1998, Data Mining and Knowledge Discovery.

[4]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[5]  Laks V. S. Lakshmanan,et al.  Constraint-Based Multidimensional Data Mining , 1999, Computer.

[6]  Johannes Gehrke,et al.  MAFIA: A Performance Study of Mining Maximal Frequent Itemsets , 2003, FIMI.

[7]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[8]  Sedigheh Abbasghorbani,et al.  Survey on sequential pattern mining algorithms , 2015, 2015 2nd International Conference on Knowledge-Based Engineering and Innovation (KBEI).

[9]  Jeffrey F. Naughton,et al.  On differentially private frequent itemset mining , 2012, Proc. VLDB Endow..

[10]  Ramakrishnan Srikant,et al.  The Quest Data Mining System , 1996, KDD.

[11]  Qiang Fu,et al.  Correlating events with time series for incident diagnosis , 2014, KDD.

[12]  Tomasz Imielinski,et al.  DataMine: Application Programming Interface and Query Language for Database Mining , 1996, KDD.

[13]  Padhraic Smyth,et al.  Adaptive event detection with time-varying poisson processes , 2006, KDD '06.

[14]  Guoliang Li,et al.  PASS-JOIN: A Partition-based Method for Similarity Joins , 2011, Proc. VLDB Endow..

[15]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[16]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[17]  Krzysztof Walczak,et al.  Efficient Mining of Jumping Emerging Patterns with Occurrence Counts for Classification , 2011, Trans. Rough Sets.

[18]  Luc De Raedt,et al.  A perspective on inductive databases , 2002, SKDD.

[19]  Howard J. Hamilton,et al.  Iceberg-cube algorithms: An empirical evaluation on synthetic and real data , 2003, Intell. Data Anal..

[20]  Hui Xiong,et al.  Discovering colocation patterns from spatial data sets: a general approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[21]  Luc De Raedt,et al.  Constraint-Based Mining and Inductive Databases: European Workshop on Inductive Databases and Constraint Based Mining, Hinterzarten, Germany, March 11-13, ... / Lecture Notes in Artificial Intelligence) , 2006 .

[22]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[23]  Franco Turini,et al.  KDDML: A middleware language and system for knowledge discovery in databases , 2006, Data Knowl. Eng..

[24]  Franco Turini,et al.  Inductive database languages: requirements and examples , 2011, Knowledge and Information Systems.

[25]  Kimmo Hätönen,et al.  Constraint-Based Mining and Inductive Databases , 2006 .

[26]  Salvatore Orlando,et al.  ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[28]  Luc De Raedt,et al.  Constraint-Based Mining and Inductive Databases, European Workshop on Inductive Databases and Constraint Based Mining, Hinterzarten, Germany, March 11-13, 2004, Revised Selected Papers , 2005, Constraint-Based Mining and Inductive Databases.

[29]  Nicolas Spyratos,et al.  Composition of Mining Contexts for Efficient Extraction of Association Rules , 2002, EDBT.

[30]  John F. Roddick,et al.  Sequential pattern mining -- approaches and algorithms , 2013, CSUR.

[31]  P. Sammulal,et al.  Survey on Sequential Pattern Mining Algorithms , 2013 .

[32]  Liang Tang,et al.  Optimizing system monitoring configurations for non-actionable alerts , 2012, 2012 IEEE Network Operations and Management Symposium.

[33]  Shashi Shekhar,et al.  Mixed-Drove Spatio-Temporal Co-occurrence Pattern Mining : A Summary of Results , 2006 .

[34]  Alessandro Campi,et al.  Discovering interesting information in XML data with association rules , 2003, SAC '03.

[35]  Svetha Venkatesh,et al.  Event extraction using behaviors of sentiment signals and burst structure in social media , 2013, Knowledge and Information Systems.

[36]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[37]  Arnaud Giacometti,et al.  A Relational View of Pattern Discovery , 2011, DASFAA.

[38]  Bart Goethals,et al.  Mining Non-Derivable Association Rules , 2005, SDM.

[39]  Maguelonne Teisseire,et al.  Mining closed partially ordered patterns, a new optimized algorithm , 2015, Knowl. Based Syst..

[40]  Liang Tang,et al.  Discovering lag intervals for temporal dependencies , 2012, KDD.

[41]  Wei Wang,et al.  DMQL: A Data Mining Query Language for Relational Databases , 2007 .

[42]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[43]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[44]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[45]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[46]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[47]  Maguelonne Teisseire,et al.  OrderSpan: Mining Closed Partially Ordered Patterns , 2013, IDA.