Validating an Access Cost Model for Wide Area Applications

In this paper, we describe a case study in developing an access cost model for WebSources in the context of a wrapper mediator architecture. We document our experiences in validating this model, and note successes and lessons learned. Using experimental data of query feedback from several WebSources, we develop a Catalog and Access Cost model. We identify WebSource characteristics of the query feedback that are reflective of the particular WebSource behavior and identify groupings of WebSources based on these characteristics. We also characterize the Access Cost model as having High or Low Prediction Accuracy, with respect to its ability to predict access costs for the WebSources. We then correlate WebSource characteristics and groupings of WebSources with High or Low prediction accuracy of the model.

[1]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[2]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[3]  Vladimir Zadorozhny,et al.  Learning response times for WebSources: a comparison of a web prediction tool (WebPT) and a neural network , 1999, Proceedings Fourth IFCIS International Conference on Cooperative Information Systems. CoopIS 99 (Cat. No.PR00384).

[4]  Craig E. Wills,et al.  Towards a Better Understanding of Web Resources and Server Responses for Improved Caching , 1999, Comput. Networks.

[5]  R. Lyman Ott.,et al.  An introduction to statistical methods and data analysis , 1977 .

[6]  R. Payne Geographic names information system , 1983 .

[7]  Béatrice Finance,et al.  IRO-DB: a distributed system federating object and relational databases , 1995 .

[8]  R. Wilder,et al.  Wide-area Internet traffic patterns and characteristics , 1997, IEEE Netw..

[9]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[10]  Paul Francis,et al.  An architecture for a global Internet host distance estimation service , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[11]  Srinivasan Seshan,et al.  A network measurement architecture for adaptive applications , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[12]  Amit Aggarwal,et al.  RaDaR: A Scalable Architecture for a Global Web Hosting Service , 1999, Comput. Networks.

[13]  David R. Karger,et al.  Web Caching with Consistent Hashing , 1999, Comput. Networks.

[14]  Lixia Zhang,et al.  On the placement of Internet instrumentation , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[15]  Laura M. Haas,et al.  Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System , 1999, VLDB.

[16]  Vladimir Zadorozhny,et al.  Learning response time for WebSources using query feedback and application in query optimization , 2000, The VLDB Journal.

[17]  Jaime Prilusky,et al.  GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support , 1998, Bioinform..

[18]  Miron Livny,et al.  The Case for Enhanced Abstract Data Types , 1997, VLDB.

[19]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[20]  Peter Scheuermann,et al.  Selection algorithms for replicated Web servers , 1998, PERV.