QoS-based Data Access and Placement for Federated Information Systems

A wide variety of applications require access to multiple heterogeneous, distributed data sources. By transparently integrating such diverse data sources, underlying differences in DBMSs, languages, and data models can be hidden and users can use a single data model and a single highlevel query language to access the unified data through a global schema. To address the needs of such federated information systems, IBM has developed the DB2 Information Integrator (II) [1] to provide relational access to both relational DBMSs and non-relational sources, such as file systems and web services. These data sources are registered at II as nicknames and thereafter can be accessed via wrappers. Statistics about the remote databases are collected and maintained at II for later use by the optimizer for costing query plans. DB2 Information Integrator deploys cost-based query optimization to select a low cost global query plan to execute. Thus, cost functions used by II heavily influence what remote servers (i.e. equivalent data sources) to access and how federated queries are processed. Cost estimation is usually based on database statistics, query statements, and the local and remote system configuration, such as the CPU power and I/O device characteristics. DB2 allows the system administrator to specify expected network latency between II and the remote servers. However, existing cost functions do not consider (1) the load on the remote servers, (2) dynamic nature of network latency between remote servers and II, and (3) the availability of the remote sources. As a result, federated information systems cannot dynamically adapt to runtime environment changes, such as network congestions or load spikes at the remote

[1]  Laura M. Haas,et al.  Garlic: a new flavor of federated query processing for DB2 , 2002, SIGMOD '02.

[2]  K. Selçuk Candan,et al.  Load and network aware query routing for information integration , 2005, 21st International Conference on Data Engineering (ICDE'05).