Query Rewriting in Itemset Mining

In recent years, researchers have begun to study inductive databases, a new generation of databases for leveraging decision support applications. In this context, the user interacts with the DBMS using advanced, constraint-based languages for data mining where constraints have been specifically introduced to increase the relevance of the results and, at the same time, to reduce its volume. In this paper we study the problem of mining frequent itemsets using an inductive database 1 . We propose a technique for query answering which consists in rewriting the query in terms of union and intersection of the result sets of other queries, previously executed and materialized. Unfortunately, the exploitation of past queries is not always applicable. We then present sufficient conditions for the optimization to apply and show that these conditions are strictly connected with the presence of functional dependencies between the attributes involved in the queries. We show some experiments on an initial prototype of an optimizer which demonstrates that this approach to query answering is not only viable but in many practical cases absolutely necessary since it reduces drastically the execution time.

[1]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[2]  Georges Gardarin,et al.  Advances in Database Technology — EDBT '96 , 1996, Lecture Notes in Computer Science.

[3]  Joseph L. Hellerstein,et al.  Discovery in multi-attribute data with user-defined constraints , 2002, SKDD.

[4]  Tomasz Imielinski,et al.  DataMine: Application Programming Interface and Query Language for Database Mining , 1996, KDD.

[5]  Laks V. S. Lakshmanan,et al.  Exploiting succinct constraints using FP-trees , 2002, SKDD.

[6]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[7]  Chang Li,et al.  Deriving Orthogonality to Optimize the Search for Summary Data , 1999, Inf. Syst..

[8]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[9]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[10]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[11]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[12]  Francesco M. Malvestuto The derivation problem of summary data , 1988, SIGMOD '88.

[13]  Luc De Raedt,et al.  A theory of inductive query answering , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[14]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[15]  Chris Clifton,et al.  Query flocks: a generalization of association-rule mining , 1998, SIGMOD '98.

[16]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[17]  Luc De Raedt,et al.  A perspective on inductive databases , 2002, SKDD.

[18]  Carlo Zaniolo,et al.  User Defined Aggregates for Logical Data Languages , 1998, DDLP.

[19]  Surajit Chaudhuri,et al.  Efficient evaluation of queries with mining predicates , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[21]  Jeffrey F. Naughton,et al.  Simultaneous optimization and evaluation of multiple dimensional queries , 1998, SIGMOD '98.

[22]  Werner Nutt,et al.  Deciding equivalences among aggregate queries , 1998, PODS '98.

[23]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[24]  Kyuseok Shim,et al.  Optimizing Queries with Aggregate Views , 1996, EDBT.

[25]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[26]  Myoung-Ho Kim,et al.  Rewriting OLAP queries using materialized views and dimension hierarchies in data warehouses , 2001, Proceedings 17th International Conference on Data Engineering.

[27]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.