Determining remote system contention states in query processing over the Internet

In the environment of data integration over the Internet, three major factors affect the cost of a query: network congestion situation, server contention states (workload), and data/query complexity. We concentrate on system contention states. For a remote data source, we first determine the total number of contention states of the system through applying clustering techniques to the costs of sample queries. We then develop a set of cost formulae for each of the contention states using a multiple regression process. Finally, we estimate the system's current contention state when a query is issued and using either a time slides method or a statistical method depending on the information we have about the system. Our method can accurately predict the system contention state so that the effect of the contention states on the cost of queries can be estimated precisely.

[1]  Per-Åke Larson,et al.  A query sampling method for estimating local cost parameters in a multidatabase system , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[2]  ZhaoHui Tang,et al.  Calibrating the Query Optimizer Cost Model of IRO-DB, an Object-Oriented Federated Database System , 1996, VLDB.

[3]  Qiang Zhu,et al.  Building regression cost models for multidatabase systems , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[4]  Vladimir Zadorozhny,et al.  Validating an Access Cost Model for Wide Area Applications , 2001, CoopIS.

[5]  Laura M. Haas,et al.  Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System , 1999, VLDB.

[6]  Vladimir Zadorozhny,et al.  Learning response time for WebSources using query feedback and application in query optimization , 2000, The VLDB Journal.

[7]  Hubert Naacke,et al.  Leveraging mediator cost models with heterogeneous data sources , 1998, Proceedings 14th International Conference on Data Engineering.

[8]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[9]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[10]  Qiang Zhu,et al.  Cost Estimation for Large Queries via Fractional Analysis and Probabilistic Approach in Dynamic Multidatabase Environments , 2000, DEXA.

[11]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[12]  Per-Åke Larson,et al.  Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems , 1998, Distributed and Parallel Databases.

[13]  S. Chatterjee,et al.  Regression Analysis by Example (2nd ed.). , 1992 .