Simultaneous optimization of complex mining tasks with a knowledgeable cache

With an increasing use of data mining tools and techniques, we envision that a Knowledge Discovery and Data Mining System (KDDMS) will have to support and optimize for the following scenarios: 1) Sequence of Queries: A user may analyze one or more datasets by issuing a sequence of related complex mining queries, and 2) Multiple Simultaneous Queries: Several users may be analyzing a set of datasets concurrently, and may issue related complex queries.This paper presents a systematic mechanism to optimize for the above cases, targeting the class of mining queries involving frequent pattern mining on one or multiple datasets. We present a system architecture and propose new algorithms to simultaneously optimize multiple such queries and use a knowledgeable cache to store and utilize the past query results. We have implemented and evaluated our system with both real and synthetic datasets. Our experimental results show that our techniques can achieve a speedup of up to a factor of 9, compared with the systems which do not support caching or optimize for multiple queries.

[1]  S. Ruggles Integrated Public Use Microdata Series , 2021, Encyclopedia of Gerontology and Population Aging.

[2]  Jennifer Widom,et al.  A First Course in Database Systems , 1997 .

[3]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Saso Dzeroski,et al.  Multi-relational data mining: an introduction , 2003, SKDD.

[5]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[6]  Joel H. Saltz,et al.  Multiple Query Optimization for Data Analysis Applications on Clusters of SMPs , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[7]  Luc De Raedt,et al.  An algebra for inductive query evaluation , 2003, Third IEEE International Conference on Data Mining.

[8]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[9]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[10]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[11]  Michèle Sebag,et al.  Scalability and efficiency in multi-relational data mining , 2003, SKDD.

[12]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[13]  Jiawei Han,et al.  Mining closed relational graphs with connectivity constraints , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Jeffrey F. Naughton,et al.  Simultaneous optimization and evaluation of multiple dimensional queries , 1998, SIGMOD '98.

[15]  Geoffrey I. Webb,et al.  On detecting differences between groups , 2003, KDD '03.

[16]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[17]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[18]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[19]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[20]  Ulrich Güntzer,et al.  Is pushing constraints deeply into the mining algorithms really what we want?: an alternative approach for association rule mining , 2002, SKDD.

[21]  Chris Clifton,et al.  Query flocks: a generalization of association-rule mining , 1998, SIGMOD '98.

[22]  David J. DeWitt,et al.  Using a knowledge cache for interactive discovery of association rules , 1999, KDD '99.

[23]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[24]  Ruoming Jin,et al.  Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Joseph L. Hellerstein,et al.  Discovery in multi-attribute data with user-defined constraints , 2002, SKDD.

[26]  Luc De Raedt,et al.  A theory of inductive query answering , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[27]  Arie Segev,et al.  Using common subexpressions to optimize multiple queries , 1988, Proceedings. Fourth International Conference on Data Engineering.

[28]  Timos K. Sellis,et al.  Improvements on a Heuristic Algorithm for Multiple-Query Optimization , 1994, Data Knowl. Eng..

[29]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[30]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.

[31]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[32]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[33]  Laks V. S. Lakshmanan,et al.  Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[34]  Bart Goethals,et al.  On Supporting Interactive Association Rule Mining , 2000, DaWaK.

[35]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.