Optimization of online data integration

Online data integration is a process of continuous consolidation of data transmitted over the wide area networks with data already stored at a central site of a multidatabase system. The continuity of the process requires activation of data integration procedure each time a new portion of data is received at a central site. Efficient implementation of online data integration needs a new system of elementary operations on the increments and/or decrements of data and the intermediate results of integration. This work shows how to derive a new system of elementary operations for online data integration from a system of base operations on the data containers. In particular, we define a new system of online operations based on the system of binary operations of relational algebra. The paper analyses the properties of the new system and describes the transformations of global data integration expressions into the collections of online data integration plans. It is presented how the system can be used for the comprehensive analysis and optimization of online data integration plans. The optimization techniques described in the paper include reduction of input data increments, identification and elimination of intermediate materializations, and reduction of fixed size arguments in online data integration plans

[1]  Norman W. Paton,et al.  Adaptive Query Processing: A Survey , 2002, BNCOD.

[2]  Laurent Amsaleg,et al.  Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[3]  Janusz R. Getta,et al.  On Efficient Query Evaluation in Multidatabase Systems , 1995, ADBIS.

[4]  Laurent Amsaleg,et al.  Dynamic Query Operator Scheduling for Wide-Area Remote Access , 1998, Distributed and Parallel Databases.

[5]  Walid G. Aref,et al.  Hash-merge join: a non-blocking join algorithm for producing fast and early join results , 2004, Proceedings. 20th International Conference on Data Engineering.

[6]  Joseph M. Hellerstein,et al.  Using state modules for adaptive query processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[8]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[9]  Michael J. Carey,et al.  Compensation-based on-line query processing , 1992, SIGMOD '92.

[10]  Robert L. Grossman,et al.  Data integration in a bandwidth-rich world , 2003, CACM.

[11]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[12]  Janusz R. Getta,et al.  Optimization of data stream processing , 2004, SGMD.

[13]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[14]  Gerhard J. Woeginger,et al.  Developments from a June 1996 seminar on Online algorithms: the state of the art , 1998 .

[15]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[16]  Michael J. Franklin,et al.  Dynamic Pipeline Scheduling for Improving Interactive Query Performance , 2001, VLDB.

[17]  Mohamed Ziauddin,et al.  Query processing and optimization in Oracle Rdb , 1996, The VLDB Journal.

[18]  Luc Bouganim,et al.  A Dynamic Query Processing Architecture for Data Integration Systems , 2000, IEEE Data Eng. Bull..

[19]  Janusz R. Getta Query scrambling in distributed multidatabase systems , 2000, Proceedings 11th International Workshop on Database and Expert Systems Applications.

[20]  Ángel Viña,et al.  An alternative architecture for financial data integration , 2004, CACM.

[21]  Janusz R. Getta,et al.  Optimizing global query processing plans in heterogeneous and distributed multidatabase systems , 1999, Proceedings. Tenth International Workshop on Database and Expert Systems Applications. DEXA 99.

[22]  Janusz R. Getta,et al.  On Adaptive and Online Data Integration , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[23]  Alon Y. Halevy,et al.  Adapting to source properties in processing data integration queries , 2004, SIGMOD '04.

[24]  Panos Vassiliadis,et al.  ARKTOS: A Tool For Data Cleaning and Transformation in Data Warehouse Environments , 2000, IEEE Data Eng. Bull..

[25]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.