Learning response times for WebSources: a comparison of a web prediction tool (WebPT) and a neural network

The rapid growth of the Internet and support for interoperability protocols has increased the number of Web accessible sources, WebSources. Current wrapper mediator architectures need to be extended with a Wrapper Cost Model (WCM) for WebSources that can estimate the response time (delays) to access sources as well as other relevant statistics. In this paper we present a Web Prediction Tool (WebPT), that is used by the WCM to estimate delays. We compare WebPT learning with the more traditional Neural Network (NN) learning, for this environment. Both the WebPT and the NN learning is based on query feedback (qfb) of response times from accessing WebSources. Experiment data was collected from several sources, and those dimensions that were significant in estimating the response time were determined This includes Time of day, Day, and Quantilty of data. Both the WebPT and the NN use these dimensions to learn response times (delay) from a particular source, and then to predict the expected response times for some query. We note that the WebPT learning is always online, i.e., it learns from each new query feedback. NN training can be online (per-pattern learning), which is time consuming and can be very sensitive to the choice of training parameters. The more common and robust learning is of fine batch learning (per-epoch). We compared the WebPT learning with both types of NN learning, in a number of experiments. The ease of training the WebPT makes it preferable compared to the per-pattern NN. Further the prediction error of both the WebPT and the NN was comparable We conclude that both the online WebPT and the more sophisticated NN learning are useful in constructing a Wrapper Cost Model for the dynamic Web environment.

[1]  Miron Livny,et al.  The Case for Enhanced Abstract Data Types , 1997, VLDB.

[2]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[3]  Laura M. Haas,et al.  Capabilities-based query rewriting in mediator systems , 1996 .

[4]  Béatrice Finance,et al.  IRO-DB: a distributed system federating object and relational databases , 1995 .

[5]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[6]  R. Wilder,et al.  Wide-area Internet traffic patterns and characteristics , 1997, IEEE Netw..

[7]  Laura M. Haas,et al.  Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System , 1999, VLDB.

[8]  Vladimir Zadorozhny,et al.  Learning response time for WebSources using query feedback and application in query optimization , 2000, The VLDB Journal.

[9]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[10]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[11]  Kishan G. Mehrotra,et al.  Elements of artificial neural networks , 1996 .

[12]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[15]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[16]  Timothy Masters,et al.  Practical neural network recipes in C , 1993 .

[17]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[18]  Peter Scheuermann,et al.  Selection algorithms for replicated Web servers , 1998, PERV.

[19]  Hubert Naacke,et al.  Leveraging mediator cost models with heterogeneous data sources , 1998, Proceedings 14th International Conference on Data Engineering.

[20]  Yannis Papakonstantinou,et al.  Using Knowledge of Redundancy for Query Optimization in Mediators , 1998 .