Predicting Indexer Performance in a Distributed Digital Library

Resource discovery in a distributed digital library poses many challenges, one of which is how to choose search engines for query distribution, given a query and a set of search engines. This paper focuses on search engine performance as a criterion for search engine selection and defines two measurements of search engine performance: availability - will the search engine respond within a time limit, and response time - how quickly will the search engine respond, given that it responds at all. We predicted both of these performance characteristics with a variety of algorithms, all of which required little computation time and combined past performance data for each search engine into a succinct record. We used operational data from the NCSTRL distributed digital library to make and evaluate predictions, and we found that simple prediction methods performed as well as more complex methods and that prediction accuracy was closely related to data consistency.

[1]  Wesley W. Chu,et al.  Optimal File Allocation in a Multiple Computer System , 1969, IEEE Transactions on Computers.

[2]  Carl Lagoze,et al.  Dienst: Implementation Reference Manual , 1995 .

[3]  Kathryn S. McKinley,et al.  Performance evaluation of a distributed architecture for information retrieval , 1996, SIGIR '96.

[4]  James C. French,et al.  Ensuring Retrieval Effectiveness in Distributed Digital Libraries , 1996, J. Vis. Commun. Image Represent..

[5]  James C. French,et al.  Evaluating database selection techniques: a testbed and experiment , 1998, SIGIR '98.

[6]  Press Niso Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection, Z39.50-1995 , 1994 .

[7]  Michael Roszkowski,et al.  A Distributed Architecture for Resource Discovery Using Metadata , 1998, D Lib Mag..

[8]  Carl Lagoze From Static to Dynamic Surrogates: Resource Discovery in the Digital Age , 1997, D Lib Mag..

[9]  Peter Scheuermann,et al.  Web++: A System for Fast and Reliable Web Service , 1999, USENIX Annual Technical Conference, General Track.

[10]  Luis Gravano,et al.  The Efficacy of GlOSS for the Text Database Discovery Problem , 1993, SIGMOD 1993.

[11]  Danny Cohen,et al.  A Format for Bibliographic Records , 1995, RFC.

[12]  James C. French,et al.  Efficient searching in distributed digital libraries , 1998, DL '98.

[13]  Luis Gravano,et al.  The Effectiveness of GlOSS for the Text Database Discovery Problem , 1994, SIGMOD Conference.

[14]  James C. French,et al.  Using query mediators for distributed searching in federated digital libraries , 1999, DL '99.

[15]  Barry M. Leiner,et al.  The NCSTRL Approach to Open Architecture for the Confederated Digital Library , 1998, D-Lib Magazine.

[16]  Kevin Chen-Chuan Chang,et al.  Evaluating the cost of Boolean query mapping , 1997, DL '97.

[17]  W. Bruce Croft,et al.  Searching distributed collections with inference networks , 1995, SIGIR '95.

[18]  James C. French,et al.  A Characterization Study of NCSTRL Distributed Searching , 1999 .

[19]  Organización Internacional de Normalización ISO 23950 : Information and documentation -- Information retrieval (Z39.50) -- Application service definition and protocol specification , 1998 .

[20]  Luis Gravano,et al.  STARTS: Stanford proposal for Internet meta-searching , 1997, SIGMOD '97.

[21]  James C. French,et al.  Comparing the performance of database selection algorithms , 1999, SIGIR '99.