Towards Optimizing Conjunctive Inductive Queries

Inductive queries are queries to an inductive database that generate a set of patterns in a data mining context. Inductive querying poses new challenges to database and data mining technology. We study conjunctive inductive queries, which are queries that can be written as a conjunction of a monotonic and an anti-monotonic subquery. We introduce the conjunctive inductive query optimization problem, which is concerned with minimizing the cost of computing the answer set to a conjunctive query. In the optimization problem, it is assumed that there are costs c a and c m associated to evaluating a pattern w.r.t. a monotonic and an anti-monotonic subquery respectively. The aim is then to minimize the total cost needed to compute all solutions to the query. Secondly, we present an algorithm that aims at optimizing conjunctive inductive queries in the context of the pattern domain of strings and evaluate it on a challenging data set in computational biology.

[1]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[2]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[3]  Luc De Raedt,et al.  A perspective on inductive databases , 2002, SKDD.

[4]  Haym Hirsh,et al.  Generalizing Version Spaces , 1994, Machine Learning.

[5]  Bart Goethals,et al.  On Supporting Interactive Association Rule Mining , 2000, DaWaK.

[6]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Dimitrios Gunopulos,et al.  Discovering All Most Specific Sentences by Randomized Algorithms , 1997, ICDT.

[9]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS '97.

[10]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS.

[13]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[14]  Luc De Raedt,et al.  The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding , 2001, IJCAI.

[15]  H. Hirsh Theoretical Underpinnings of Version Spaces , 1991, IJCAI.

[16]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[17]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[18]  Laks V. S. Lakshmanan,et al.  Constraint-Based Multidimensional Data Mining , 1999, Computer.

[19]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[20]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[21]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[22]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[23]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.