Optimizing subset queries: a step towards SQL-based inductive databases for itemsets

Storing sets and querying them (e.g., subset queries that provide all supersets of a given set) is known to be difficult within relational databases. We consider that being able to query efficiently both transactional data and materialized collections of sets by means of standard query language is an important step towards practical inductive databases. Indeed, data mining query languages like MINE RULE extract collections of association rules whose components are sets into relational tables. Post-processing phases often use extensively subset queries and cannot be efficiently processed by SQL servers. In this paper, we propose a new way to handle sets from relational databases. It is based on a data structure that partially encodes the inclusion relationship between sets. It is an extension of the hash group bitmap key proposed by Morzy et al. [8]. Our experiments show an interesting improvement for these useful subset queries.

[1]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[2]  Vijay V. Raghavan,et al.  The Item-Set Tree: A Data Structure for Data Mining , 1999, DaWaK.

[3]  Giuseppe Psaila,et al.  An Extension to SQL for Mining Association Rules , 1998, Data Mining and Knowledge Discovery.

[4]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[5]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[6]  Luc De Raedt,et al.  A perspective on inductive databases , 2002, SKDD.

[7]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[8]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[9]  Tadeusz Morzy,et al.  Group Bitmap Index: A Structure for Association Rules Retrieval , 1998, KDD.

[10]  Jean-François Boulicaut,et al.  A Comparison between Query Languages for the Extraction of Association Rules , 2002, DaWaK.

[11]  Ralf Rantzau Frequent Itemset Discovery with SQL Using Universal Quantification , 2004, Database Support for Data Mining Applications.

[12]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[13]  Bing Liu,et al.  Querying multiple sets of discovered rules , 2002, KDD.

[14]  Gediminas Adomavicius,et al.  Handling very large numbers of association rules in the analysis of microarray data , 2002, KDD.