Validating a Cost Model for Wide Area Applications

Recent technology advances have enabled wide area application (WAA) processing with WebSources that are accessible via the WWW. One challenge to query processing in wide area environments is the unpredictable behavior of WebSources in the dynamic WAN. There can be wide variability in the latency (delay) of accessing these sources, and the delay could depend on the network and server workloads. These workloads are often affected by parameters such as the Time of Day, the Day of Week, etc. Another challenge is that autonomous WebSources may not provide metrics needed for accurate cost estimation. In this paper, we describe a case study in developing a cost model for WebSources in the context of a wrapper mediator architecture. We document our experiences in validating this cost model, and note successes and lessons learned. Using experimental data of query feedback from several WebSources, we characterize sources as having High or Low Prediction Accuracy, with respect to the ability of the cost model to predict access costs. We also identify WebSource characteristics of the query feedback that are correlated with High or Low prediction accuracy. A cost model is an important component of the mediator query optimizer. It uses the cost estimations of the Wrapper cost model to obtain the cost of a mediator query accessing multiple WebSources simultaneously. We examine how our research can be used to develop robust query optimization techniques for WAA processing in noisy environments.

[1]  Bernard Rous,et al.  The ACM digital library , 2001, CACM.

[2]  Laura M. Haas,et al.  Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System , 1999, VLDB.

[3]  Ioana Manolescu,et al.  Query optimization in the presence of limited access patterns , 1999, SIGMOD '99.

[4]  Vladimir Zadorozhny,et al.  Learning response time for WebSources using query feedback and application in query optimization , 2000, The VLDB Journal.

[5]  R. Lyman Ott.,et al.  An introduction to statistical methods and data analysis , 1977 .

[6]  Béatrice Finance,et al.  IRO-DB: a distributed system federating object and relational databases , 1995 .

[7]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[8]  Vladimir Zadorozhny,et al.  Learning response times for WebSources: a comparison of a web prediction tool (WebPT) and a neural network , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[9]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[10]  R. Wilder,et al.  Wide-area Internet traffic patterns and characteristics , 1997, IEEE Netw..

[11]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[12]  Craig E. Wills,et al.  Towards a Better Understanding of Web Resources and Server Responses for Improved Caching , 1999, Comput. Networks.

[13]  Amit Aggarwal,et al.  RaDaR: A Scalable Architecture for a Global Web Hosting Service , 1999, Comput. Networks.

[14]  R. Payne Geographic names information system , 1983 .

[15]  Joseph Y. Halpern,et al.  Least expected cost query optimization: an exercise in utility , 1999, PODS.

[16]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[17]  Peter Scheuermann,et al.  Selection algorithms for replicated Web servers , 1998, PERV.

[18]  Ahmed K. Elmagarmid,et al.  Object-Oriented Multidatabase Systems: A Solution for Advanced Applications , 1995 .

[19]  Miron Livny,et al.  The Case for Enhanced Abstract Data Types , 1997, VLDB.

[20]  David R. Karger,et al.  Web Caching with Consistent Hashing , 1999, Comput. Networks.

[21]  Laura Bright,et al.  A Wrapper Generation toolkit to specify and construct Wrappersfor Web Accessible Data Sources ( WebSources ) , 1999 .