ARKTOS: A Tool For Data Cleaning and Transformation in Data Warehouse Environments

Execution plans produced by traditional query optimizers for data integration queries may yield poor performance for several reasons. The cost estimates may be inaccurate, the memory available at run-time may be insufficient, or the data delivery rate can be unpredictable. All these problems have led database researchers and implementors to resort to dynamic strategies to correct or adapt the static QEP. In this paper, we identify the different basic techniques that must be integrated in a dynamic query engine. Following on our recent work [6] on the problem of unpredictable data arrival rates, we propose a dynamic query processing architecture which includes three dynamic layers: the dynamic query optimizer, the scheduler and the query evaluator. Having a three-layer dynamic architecture allows reducing significantly the overheads of the dynamic strategies.

[1]  Kian-Lee Tan,et al.  Multi-Join Optimization for Symmetric Multiprocessors , 1993, VLDB.

[2]  Martin White,et al.  Enterprise information portals , 2000, Electron. Libr..

[3]  Luc Bouganim,et al.  Dynamic Load Balancing in Hierarchical Parallel Database Systems , 1996, VLDB.

[4]  Richard Y. Wang,et al.  Toward quality data: An attribute-based approach , 2014, Decis. Support Syst..

[5]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[6]  Patrick Valduriez,et al.  Memory-adaptive scheduling for large query execution , 1998, CIKM '98.

[7]  Matthias Jarke,et al.  A Model for Data Warehouse Operational Processes , 2000, CAiSE.

[8]  Laurent Amsaleg,et al.  Scrambling query plans to cope with unexpected delays , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[9]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[10]  Diego Calvanese,et al.  Information integration: conceptual modeling and reasoning support , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[11]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[12]  Dennis Shasha,et al.  AJAX: an extensible data cleaning tool , 2000, SIGMOD '00.

[13]  C. Mohan,et al.  Interactions between query optimization and concurrency control , 1992, [1992 Proceedings] Second International Workshop on Research Issues on Data Engineering: Transaction and Query Processing.

[14]  Panos Vassiliadis,et al.  Gulliver in the land of data warehousing: practical experiences and observations of a researcher , 2000, DMDW.

[15]  Dennis Shasha,et al.  An extensible Framework for Data Cleaning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[16]  David J. DeWitt,et al.  Memory allocation strategies for complex decision support queries , 1998, CIKM '98.

[17]  Peter M. G. Apers,et al.  Parallel evaluation of multi-join queries , 1995, SIGMOD '95.

[18]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[19]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[20]  Luc Bouganim,et al.  Dynamic query scheduling in data integration systems , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[21]  Veda C. Storey,et al.  A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..

[22]  Hamid Pirahesh,et al.  Parallelism in Relational Database Management Systems , 1994, IBM Syst. J..